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The  International  Social 
Survey  Program  -  ISSP 


by  Rolf  Uher  and  Irene  Mueller' 
Zentralarchiv  fuer  empirische  Sozialforschung 
UniversiLaet  zu  Koeln 
Koeln 


The  International  Social  Survey  Program  -  Goals  and  Intentions  of  the  ISSP  ■ 

International  Social  Survey  Program  (ISSP)  is  a  continuing,  annual  program  of  cross  national 
collaboration.    It  brings  together  pre-existing  national,  social  science  projects  and  coordinates  research 
goals  by  adding  a  crossnational  perspective  to  the  individual,  national  studies. 

ISSP  grew  out  of  a  bilateral  collaboration  between  the  Allgemeinen  Bevoelkerungsumfrage  der 
Sozialwissenschaften  (ALLBUS)  administered  by  the  Zentrum  fuer  Umfragen,  Methoden,  und 
Analysen  (ZUMA)  in  Mannheim.  West  Germany  and  the  General  Social  Survey  (GSS)  of  the 
National  Opinion  Research  Center  (NORC),  University  of  Chicago.    Both  the  ALLBUS  -  a  joint 
project  of  ZUMA  and  the  Zentralarchiv  -  and  the  GSS  are  replicating,  time  series  surveys.    The 


'Presented  at  the  International  Association  for  Social  Science  Information  Service  and  Technology 
(IASSIST)  Conference  held  in  Washington,  D.C.,  May  26-29,  1988 

■This  overview  is  based  on:  Tom  W.  Smith  (SCPR)  in:  NSSD  (ed.),  EPD  Newsletter,  No.  63.  Bergen, 
June  1987 
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ALLBUS  has  been  conducted  biennially  since  1980  and  the  GSS  annually  (except  for  1979  and  1981) 
since  1972.    In  1982  ZUMA  and  NORC  devoted  a  small  segment  of  the  questionnaires  on  job  values, 
importance  of  areas  of  life,  abortion,  and  feminism.    In  1984  the  collaboration  referred  to  questions 
on  class  differences,  equality,  and  the  welfare  state. 

Meanwhile,  in  late  1983  Social  and  Community  Planning  Research  (SCPR),  London,  which  was 
starting  a  social  indicators  series  (the  British  Social  Attitudes  Survey)  similar  to  the  GSS  and 
ALLBUS,  secured  funds  from  the  Nuffield  Foundation  to  sponsor  meetings  to  further  international 
cooperation.    A  meeting  was  held  in  London  in  June,  1984  with  representatives  from  ZUMA,  NORC, 
SCPR,  and  also  the  Research  School  of  Social  Sciences,  Australian  National  University.    This  group, 
soon  to  be  christened  the  ISSP,  agreed  to  1)  joindy  develop  topical  modules  dealing  with  important 
areas  of  social  science,  2)  this  module  would  be  a  15  minute  supplement  to  the  regular  national 
surveys  (or  as  a  special  survey  if  necessary),  3)  include  an  extensive  common  core  of  the  background 
variables,  and  4)  to  make  the  data  available  to  the  social  science  community  as  soon  as  possible. 
Each  national  institution  funds  its  own  data  collection  and  bears  any  costs  that  it  incurred  through 
participation  in  the  cooperative  effort 

Since  its  initial  meeting  in  1984.  ISSP  has  grown  to  include  eleven  nations,  the  original  four  - 
Germany,  the  United  States,  Great  Britain,  and  Australia,  plus  Ausuia,  Italy.  Ireland,  the  Netherlands, 
Hungary,  Norway,  and  Israel.    Other  countries  are  currently  applying  for  membership. 

Participating  Organizations 

RSSS  Research  School  of  Social  Sciences,  Australian  National  University,  Australia 

IS  Institut  fuer  Soziologie,  Graz  University,  Austria 

ZUMA  Zentrum  fuer  Umfragen,  Methoden,  und  Analysen,  Mannheim,  Federal  Republic  of 

Germany 

SCPR  Social  and  Community  Planning  Research,  London,  Great  Britain 

TARKI  Tarsadalomkutatasi  Informatikai  Tarsulas,  Budapest,  Hungary 

EURISKO  Ricerca  Sociale  e  di  Marketing,  Milano,  Italy 

SSUC  Department  of  Social  Science,  University  College,  Dublin,  Ireland 

Israel  Tel  Aviv  University,  Dept.  of  Sociology  and  Anthropology,  Ramat  Aviv,  Israel 

NSD  Norwegian  Social  Science  Data  Archives,  Bergen,  Norway 

SCP  Sociaal  en  Cultured  Planbureau,  Rijkswijk,  The  Netherlands 

NORC  National  Opinion  Research  Center.  University  of  Chicago,  USA 

ZA  Zentralarchiv  fuer  empirische  Sozialforschung  an  der  Universitaet  Koeln,  Federal 

Republic  of  Germany 

ISSP's  first  dieme  was  the  role  of  government.    This  covered  attitudes  towards  civil  liberties,  and  law 
enforcement,  education  and  directing  the  economy,  and  welfare  and  social  equality.    The  second 
theme  was  social  networks  and  support  systems.    This  included  a  detailed  account  of  one's  contact 
with  various  relatives  and  friends  and  then  a  series  of  questions  about  where  one  would  turn  to  for 
help  when  faced  with  various  situations  such  as  financial  need,  minor  illness,  career  advice,  and 
emotional  distress.    The  third  module,  on  social  equality,  is  now  being  developed.    Questions  foctis  on 
equality  of  income,  wealth,  and  opportunity.    Respondents  are  asked  for  their  perceptions  of  the 
extent  of  present  inequality,  explanations  for  inequality,  and  support  for  government  programs  to 
reduce  inequality.    The  fourth  module  (1988)  will  deal  with  working  women  and  the  family,  and  fifth 


Winter    1988 


lassist   quarterly 


(1989)  with  work  and  leisure. 

In  1990  ISSP  will  repeat  the  role  of  government  theme.    By  replicating  substantial  parts  of  earlier 
modules,  ISSP  will  not  only  have  a  cross  national  but  also  a  time  perspective.    We  will  be  able  to 
compare  nations  and  test  whether  similar  social  science  models  are  valid  for  different  societies.    We 
will  also  be  able  to  see  if  there  are  similar  international  trends  and  whether  equivalent  models  of 
social  change  hold  true  for  different  nations. 

ISSP  brings  several  new  features  to  the  area  of  cross  national  research.    The  collaboration  among 
nations  is  not  sporadic  or  intermittent  but  routine  and  continual.    Although  the  international 
collaboration  carried  out  by  ISSP  is  more  circumscribed  than  special  cross  national  research  projects, 
ISSP  makes  cross  national  research  a  basic  part  of  the  nation's  research  agenda. 


The  Zentralarchiv  in  Cologne  serves  as  the  "ISSP  Archive" 

At  the  1986  ISSP  conference  in  Mannheim  it  was  agreed  that  the  Zentralarchiv  would  function  as  the 
central  archive  for  the  ISSP,  while  each  participant  would  continue  to  send  the  ISSP  data  to  their 
own  national  archives.    The  Archive  would  be  furnished  with  the  data  of  all  countries  who  produced 
a  data  tape,  with  SPSSx  set-up  files.    These  could  be  distributed  with  no  restriction.    The 
Zentralarchiv  would  produce  a  merged  file  for  the  ISSP  module  plus  the  agreed  demographic 
variables,  and  would  produce  a  "common  core"  codebook,  and  thus  provide  data  sets  for  cross 
national  comparison. 

The  data  files  transmitted  to  the  Archive  will  contain  the  ISSP  module,  common  background 
variables,  plus  nationally  specific  background  data  and  national  addenda  (i.e.  other  data  collected). 
The  common  data  are  delivered  in  a  pre-defined  format,  according  to  a  common  core-codebook 
(together  with  SPSS  or  SPSS.x  set-up  files).    The  Archive  merges  the  common-core  data  and 
distributes  them,  together  with  full  documentation,  as  an  integrated  file. 

An  ISSP  working  group  will  develop  the  code  for  the  common-core  of  background  variables  after 
consultauons  with  the  group,  including  schemes  for  merging  the  background  variables  according  the 
principle  of  "functional  equivalence".    On  receipt  of  the  working  group's  proposal  each  institute 
returns  a  re-coded  version  of  their  background  variables  so  that  anomalies  could  be  identified  before 
the  integrated  codebook  was  distributed. 

During  the  1987  ISSP  conference  in  Budapest  it  was  resolved  that  each  nation  would  send  to  the 
Zentralarchiv  the  following  information  in  English: 

•  sample  size  -  planned  and  completed 

•  type  of  sample  -  detailed  sampling  procedures,  stratification  factors,  information  on  clustering 

•  response  rates  and  how  they  are  calculated 
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known  systematic  properties  of  the  sample:  bias,  differential  attrition,  sampling  cnicicncy  and 
information  on  design  effects 

weighting  -  details  of  weighting  and  its  effects 

fieldwork  dates 

fieldwork  methods  -  whether  drop  off,  postal;  self-completion  or  personal  interview 

context  -  other  topics  in  the  questionnaire,  their  placement  vis  a  vis  the  ISSP  modul 

known  deviations  from  standard  ISSP  question  wording  must  be  clearly  marked 

a  note  on  coding  and  editing  procedures  and  a  blank  questionnaire 

the  names  of  the  principle  investigators  at  each  institution 


The  absence  of  any  of  these  details  would  render  the  dataset  incomplete.    This  would  mean  that  it 
could  not  be  included  in  the  combined  dataset  until  informauon  was  passed  to  the  Archive. 


The  Cluttered  Reality  of  Comparative  Data  Sets 

Processing  the  1985  module  on  "Role  of  Government"  proved  that  agreement  and  practice  cannot 
easily  be  brought  to  coincide,  thus  the  Zentralarchiv  invested  more  time  and  resources  to  prepare  the 
prototype  of  the  1985  ISSP  modul  then  was  agreed  during  the  different  ISSP  meetings.    Especially 
the  timing  was  badly  off.    The  first  dataset  of  the  1985  modul  reached  the  Archive  in  April  1986  — 
the  last  one  arrived  in  September  1987. 

The  amount  of  information  sent  with  the  datasets  varied  from  a  few  notes  on  a  sheet  of  paper  to  a 
complete  codebook.  Languages  varied  from  the  vernacular  to  well  prepared  English  with  versions  of 
foreigner's  English  in-between.    Nearly  every  case  of  documentation  created  problems. 

As  the  ISSP  self-completion  questionnaire  was  part  of  the  "general  social  surveys"  in  some  countries 
and  was  conducted  under  financial  restrictions  in  others,  the  background  questions  in  the  main 
followed  the  tradition  within  those  countries  more  than  ihey  conformed  to  the  common-core  which 
was  agreed  upon.    In  order  to  maximize  comparability  it  was  necessary  to  identify,  recode  and  merge 
variables  within  countries.    Filter  conditions  had  to  be  reconstructed,  recoding  procedures  required 
some  time  to  go  back  to  the  original  data,  variable  labels  and  names  were  far  from  being 
standardised  and  hard  to  interpret.    Questions  from  the  ISSP  modules  were  missing  in  some 
counuies,  definition  of  missing  data  varied,  some  SPSS  system  files  ran  into  capacity  problems  etc. 
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Literature  e.g  about  educational  systems  in  the  participating  countries,  about  the  occupational  structure 
or  about  the  party  systems  had  to  be  consulted  to  permit  documentation  in  the  appendix  of  the 
resulting  codebook.    Country-specific  wording  of  questions  and  translations  into  English  had  to  be 
added  to  the  documentation,  too. 

In  the  meantime  the  1985  ISSP  module  -  data  and  codebook  -  have  been  distributed  to  the 
participating  msiitutions.    Upon  request  data  sets  were  also  supplied  to  the  national  data  archives. 
Supplying  the  national  and  international  scientific  community  with  data  from  the  project  is,  in 
addition  to  merging  the  national  files  to  one  international  dataset,  a  major  task  and  function  of  the 
ISSP-Archive. 


ISSP  data  at  the  Zentralarchiv  (July  1988) 

1985  module:  "Role  of  Government" 


Carried  out  in: 

ZA  study  No.: 

Sample  Size: 

Australia 

1496 

1528 

Austria 

1495 

987 

Federal  Republic 

1491 

1048 

Italy 

1493 

1580 

United  Kingdom 

1492 

1530 

USA 

1494 

677 

Integrated  codebook 

and  dataset  available 

1490 

1986  module:  "Social  Networks" 


Carried  out  in: 

ZA  study  No.: 

Sample  Size: 

Australia 

1622 

1250 

Austria 

1495 

1027 

Federal  Republic 

1500 

2809 

Hungary 

1498 

1747 

Italy 

1640 

1027 

United  Kingdom 

1623 

1416 

USA 

1563 

1470 

Integrated  codebook 

and  dataset  in  process 

1620 
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1987  module:  "Social  Inequality" 


Carried  out  in: 

ZA  study  No.: 

Sample  Size: 

Australia 

not  yet  ava 

liable 

Austria 

not  yet  ava 

liable 

Federal  Republic 

1641 

1397 

Hungary 

1497 

2606 

Italy 

1640 

1027 

The  Netherlands 

1673 

1990 

United  Kingdom 

1668 

2847 

USA 

1636 

1819 

Integrated  codebook 

and  dataset  planned 

1680 

1988  module:  "Family  and  Sex  Role" 

1989  module:  "Work  Orientation" 

Planned  to  be  carried  out  in: 
Australia 
Austria 

Federal  Republic 
Hungary 
Ireland 
Italy 

The  Netherlands 
United  Kingdom 
USA 


Perspectives  for  the  future 

The  International  Social  Survey  Program  is  by  now  a  dependable,  continually  enlarging  and  effective 
cooperation  producing  a  valuable  cross-national  data  with  also  some  time-series  quality.    The  trade 
off  between  ambition  and  pragmatism  which  is  always  necessary  in  such  a  heterogeneous  international 
group  of  social  scientists  has  reached  a  very  high  level  of  quality.    This  forces  the  Zentralarchiv  to 
think  about  new  and  effective  forms  of  services  for  this  special  data-collection. 
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As  for  now  the  main  work  is  to  merge  the  individual  modules  into  a  single  file  for  international 
comparisons.    For  the  future  we  have  to  plan  an  additional  design.    In  doing  so,  we  have  to  take 
into  account  that  a  complex  data  base  will  grow  in  which  topics  will  be  replicated  over  time. 
Indicators  may  be  added,  deleted  or  changed  and  additional  countries  will  join  the  program.    A  kind 
of  data  organisation  has  to  be  found  which  is  easy  to  handle  on  the  side  of  the  user-community  and 
which  allows  the  organization  of  a  large  enough  body  of  data  to  serve  most  of  the  potential  retrieval 
and  research  tasks.    Similar  to  the  concept  of  the  ALLBUS  the  ISSP  data-base  could  in  the  final 
step  be  organised  in  a  data-base  management  system  (DBMS)  leaving  options  open  to  different  needs 
as  e.g  on-line  analysis  or  on-line  retrieval,  custom-tailored  PC  solutions,  teaching  packages  and 
whatever  else  is  conceivable.    Thus  the  ISSP  can  serve  as  a  prototype  for  forthcoming  cross  national 
studies  also  in  terms  of  data  management  while  it  is  already  an  example  of  cooperation  and 
coordination  between  national  and  international  researcher  groups.n 
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The 

Comparative 

Project  on  Class 

Structure  and 

Class 
Consciousness 


Introduction 

Most  of  the  quantitative  empirical  research  on 
class  is  based  on  the  gradational  concepts  of 
class  -  that  is.  classes  are  understood  as  being 
ordered  in  terms  of  the  degree  to  which  their 
members  possess  some  quantifiable  attributes 
such  as  income,  education,  or  occupation  for 
example.    The  names  used  to  describe  class 
positions  -  upper,  middle,  lower  -  reflect  this 
ordering.    However,  most  sociological  theories  of 
class  adopt  relational  views  in  which  classes  are 
defined  in  terms  of  the  qualitatively  defined 
system  of  social  relations  -  capitalist  and 
worker,  haves  and  have  nots  -  and  these 
relational  properties  are  at  best  only  indirectly 
measured  in  conventional  data  sources. 


by  Roger  Jones' 

Social  Science  Data  Archives 

The  Australian  National  University 


This  is  a  brief  summary  of  this  data  collection 
gleaned  from  the  papers  given  in  the  references, 
which  are  themselves  a  very  limited  selection 
from  those  available  from  the  project.    Those 
interested  in  obtaining  further  details  of  the 
project  and  it's  results  should  contact  the 
principal  investigators  at  the  University  of 
Wisconsin,  Madison. 


This  project,  initiated  by  Eric  Olin  Wright  at 
the  University  of  Madison-  Wisconsin,  sets  out 
to  fill  this  gap  in  class  analysis.    By  developing 
a  survey  questionnaire  which  includes  measures 
of  the  relational  concepts  of  class  as  well  as  the 
usual  gradational  concepts,  Wright  aims  to 
provide  relational  class  descriptions  of  developed 
Western  industrial  societies  and  compare  the 
explanatory  capabilities  of  the  various  alternative 
concepts  of  class. 

The  basic  objectives  of  the  project  are  (Wright. 
1982A): 

•    to  investigate  rigorously  the  objective 

contours  of  the  class  structures  of  advanced 
industrial  societies  in  relational  terms; 


'Presented  at  the  International  Association  for 
Social  Science  Information  Service  and 
Technology  (lASSIST)  Conference  held  in 
Washington.  D.C..  May  26-29,  1988 


to  understand  and  explain  the  character  of 
the  variations  in  these  class  structures; 

to  analyse  the  effects  of  these  variations  in 
class  structures  on  a  variety  of  individual 
level  outcomes:  income,  social  and  political 
attitudes,  political  behaviour; 

to  examine  the  interactions  between 
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macro-structural  variations  and 
micro-biological  variations  in  determining 
individual  outcomes. 


Operationalising  the  Model 

In  order  to  allocate  individuals  to  class  positions, 
Wright's  approach  requires  much  more 
information  about  a  person's  occupation  than  is 
usual.    Respondents  have  to  be  asked  about: 


supervision,  particularly  the  ways  in  which 
people  in  supervisory  positions  can  or  cannot 
impose  various  kinds  of  rewards  and 
punishments  on  their  subordinates; 

decision-making  over  various  kinds  of  policy 
issues  at  the  respondent's  place  of  work; 

autonomy  over  various  aspects  of  the 
respondent's  own  work,  particularly  over  the 
design  and  planning  of  the  content  of  work; 

formal  hierarchical  position,  that  is,  the 
official  location  of  the  respondent's  job 
within  the  organisational  hierarchy  of  the 
workplace;  and 

ownership  of  the  means  of  production,  both 
in  terms  of  one's  principal  work  and  in 
terms  of  owning  income-generating  property. 


Variables  for  descriptive  analysis 

To  understand  and  explain  the  character  of 
variations  in  class  structures,  the  study  includes 
a  great  deal  of  information  on  what  Wright 
calls  the  respondents  class  biography:  class 
origins;  the  experience  of  self  employment, 
unemployment  and  previous  supervisory 
positions;  detailed  work  histories  over  the  past 
two  jobs;  class  and  occupation  of  close  friends 
and  spouse.    These  data  make  it  possible  to 
examine  the  effect  that  class  experiences  have 
on  patterns  of  class  consciousness,  identifications 
and  so  on,  and  also  provide  the  basis  for  a 
comparative  study  of  class  mobility. 

Data  on  labour  market  issues  allows  systematic 
examination  of  the  interrelationship  between 
labour  market  segmentation  and  class  structures. 
Topics  include  job  shifts  from  the  previous  two 
jobs;  unemployment  experience;  promotional 
expectations  on  the  present  job;  level  of  formal 
and  informal  education;  seniority  and  other 
variables. 

Part  of  the  study  has  been  devoted  to  issues 
around  the  sexual  division  of  labour  in  the 
home.  These  include  rough  measures  of  the 
amount  of  time  and  responsibility  devoted  to 
different  tasks;  relative  influence  over  certain 
kinds  of  decisions;  occupation  and  class  of 
spouse. 


Social  Attitudes  and  Social  Consciousness 

A  major  aim  of  the  study  is  to  examine  the 
relative  importance  of  individual  biographies  and 
experiences  compared  to  structural  and  historical 
factors  in  determining  social  attitudes  and  class 
consciousness.    The  major  variables  of  this  type 
are: 
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codings  of  open  ended  questions  seeking 
explanations  for  crime,  poverty  and  the 
energy  crisis,  and  of  what  solutions  to  these 
problems  would  be  most  effective; 

views  on  what  is  good  and  bad,  desirable 
and  undesirable  on  various  political  policy 
matters,  male-female  relations,  economic 
inequality  and  other  issues; 

views  on  the  feasibility  of  alternative  ways  of 
organising  society,  the  economy  and  politics; 
and 

conventional  questions  on  class  identification 
and  party  identification. 


Other  measures  of  class 

One  of  the  most  important  features  of  this  data 
set  is  that  it  contains  good  measures  of  class 
and  stratification  concepts  from  a  variety  of 
different  perspectives  and  from  a  variety  of 
countries.    This  allows  analyses  which  compare 
the  predictive  power  of  alternative  concepts  of 
class,  in  particular  Marxist  versus  non-Marxist 
but  also  alternatives  within  the  Marxist  tradition. 
The  study  includes  standard  measures  of 
occupation;  most  important  work  activities; 
industrial  sector;  education  and  training  for  the 
job;  promotion  patterns;  income;  characteristics 
of  any  second  job;  and  involvement  in  "second 
economy"  activities. 


Counties  surveyed  and  available  data 

To  the  best  of  my  knowledge,  the  survey  has 
been  fielded  in  the  following  countries  to  date: 


Country 

Year 

Principal  Investigator 

U.S.A. 

1980 

Eric  Olin  Wright,  Univ.  of 
Wisconsin 

Sweden 

1980 

Goran  Ahrne,  Uppsala 
Univ. 

Norway 

1982 

Tom  Colbjornsen,  Univ.  of 
Beergen 

Canada 

1983 

John  Myles,  Cariton  Univ. 

Finland 

1981 

Raimo  Blom,  Univ.  of 
Tampere 

U.K. 

1983 

Howard  Newby,  Univ.  of 
Essex 

New  Zealand 

1984 

Chris  Wilkes,  Massey 
Univ. 

W.    Germany 

1985 

Herman  Strasser,  Univ.  of 
Duisburg 

Demark 

1985 

Jens  Hoff,  Univ.  of 
Copenhgen 

Ausualia 

1986 

John  Western.  Univ.  of 
Queensland 

Japan 

1987 

Katsu  Harada.  Meiji 
Gakuin  Univ. 

Funding  proposals  have  been  submitted  in  Italy, 
France  and  Portugal  and  a  pilot  study  has  been 
carried  out  in  Poland. 

A  five  nation  merged  data  file  including  data 
from  the  surveys  conducted  in  the  United 
States,  Sweden,  Norway,  Canada  and  Finland 
became  available  through  ICPSR  in  1986  (Study 
Number  8413,  Class  II).    The  second  version  is 
expected  to  include  additional  data  from  the 
remaining  six  surveys  conducted  to  date  and 
should  be  available  from  ICPSR  in  1989. 


A  proposal  was  submitted  to  the  NSF  in 
January  1988  for  funding  of  a  joint  USA-USSR 
survey  1989  which  would  replicate  many  of  the 
questions  asked  in  the  earlier  survey  and 
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provide  addiiional  informauon  on  attitudes  about 
worker  control  of  workplace  decisions, 
distributive  justice  and  economic  change;  work 
history;  household  economy  and  participation  in 
the  informal  economy.    This  survey  is  also  to 
be  fielded  in  Hunearv. 
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across  time."    The  questionnaire  used  in  each 
country  was  developed  by  the  group  and 
translated  into  the  relevant  languages.    Every 
attempt  was  made  to  ensure  the  comparabilty  of 
questions  in  each  country. 


Eurpoean  Values 
Study 


The  Study 

The  populations  of  ten  European  nations  were 
surveyed  in  the  period  between  March  and  May 
of  1981: 


by  Marcia  Taylor' 
ESRC  Data  Archive 
University  of  Essex 
Wivenhoe  Park 
Colchester,  Essex 
C04  3SQ 
England 


The  European  Values  Systems  Study  Group 
(EVSSG),  an  international  group  of  researchers 
investigating  values  and  norms,  began 
discussions  in  1979  and.  after  a  number  of 
preliminary  studies,  undertook  pilot  studies  in 
France,  Great  Britain,  West  Germany  and  Spain 
during  1980.    In  1981,  fieldwork  began  in  ten 
European  nations  on  a  major  study  -  the 
European  Values  Survey.    Its  stated  aim  was  "to 
investigate  the  nature  and  inter-relationships  of 
values  systems,  their  degree  of  homogeneity  and 
the  extent  to  which  they  are  subject  to  change 


France 

Great  Britain 

Ulster 

West  Germany 

Italy 

The  Netherlands 

Denmark 

Belguim 

Spain 

Eire 

A  repeat  survev  is  planned  in  these  countries  in 
1990 

The  same  questionnaire  has  been  employed  in 
surveys  in  a  number  of  other  European 
countries  as  well  (excluding  only,  the  EVSSG 
reports,  Luxembourg,  Greece,  Portugal, 
Switzerland  and  Austria,  with  the  last  two 
countries  planning  surveys  in  1988).    In  addition, 
matching  surveys  were  carried  out  in  Hungary 
and  the  Soviet  Union  in  Eastern  Europe. 

Outside  Europe,  the  questionnaire  has  been  used 
in  the  following  countries:  U.S.A.,  Canada. 
Japan,  Mexico,  South  Africa,  Chile,  .Argentina, 
South  Korea,  Kuwait,  Lebanon  and  New 
Zealand. 


'Presented  at  the  International  Association  for 
Social  Science  Information  Service  and 
Technologv  (IASSIST)  Conference  held  in 
Washmgton,  D.C.,  May  26-29,  1988 


Winter    /$ 


iassist    quarterly 


15 


The  Ten- Nation  Data  Set 

The  ESRC  Data  Archive  has  been  asked  by  the 
EVSSG  to  distribute  the  data  from  the 
ten-nation  European  study  on  its  behalf.    An 
integrated  codebook  for  the  nine  datasets  has 
been  prepared  by  the  Archive.    A  full  Study 
Description  of  these  studies  appears  below. 

European  Values  Survey,  1981 

Purpose 

A  large  scale  cross-national  and  on-going 
survey  of  moral,  religious,  political,  and 
social  values  in  Western  Europe. 

Variables 

Religious  attitudes,  beliefs,  practice, 
affiliation.    Moral  outlook.    Political 
interest,  inclination,  participation. 
Attitudes  towards  reform,  civic  institution, 
means  of  production.    Other  political 
values  indicators.    Personal  values, 
attitudes  towards  the  family,  marriage, 
divorce,  sex.    Work  values.    Perception  of 
the  future.    Satisfaction  ratings,  indicators 
of  psychological  well-being,  health. 
Range  of  socio-demographic  variables: 
sex,  age,  housing  tenure  and  type, 
terminal  education  age,  household  size 
and  composition,  marital  status, 
employment  status,  occupational  code  for 
respondent  and  chief  wage  earner, 
workplace  details,  trade  union 
membership,  regional  codes,  arera  types, 
income  on  a  scale,  scale,  (ethnic  group 
and  socioeconomic  status  by  interviewer 
observation). 

Additional  data  include: 

leisure  activities,  voluntary  work. 
Attitudes  towards  science,  war,  terrorism. 
Index  of  internationalism.    Attitudes 
towards  a  shorter  working  week. 


Measurement  Scales 

Personal  values  scale,  moral  justification 
scale,  work  values  scale,  work  orientation 
scale.  Left-right  (political)  scale.  Political 
Protest  scale,  Materialist-post-materialist 
scale,  Greeley  Spiritual  Experience 
Battery. 

Repeated  Questions  and  Sources 
Eurc^barometers  -  Left-right  scale. 
Materialist-post-materialist  scale  (also 
Inglehart,  R.,  The  Silent  Revolution, 
Princetown  University  Press:  1977); 
Satisfaction  ratings  (also  Campbell  et  al. 
1976); 

Affect  Balance  Scale:  Bradbum,  N.M., 
The  structure  of  Psychological  Weil-Being 
(Chicago:  Aldine,  l'969); 
Political  Protest  Scale:  Marsh  A.,  Protest 
and  political  conscoiusness  (London:  Sage, 
1977); 

Gallup  trend  questions  -  religious  beliefs 
(1947,  1968  trends),  important  features  in 
a  marriage  (1952  trend). 

Publications/reports 

Stieuel,  J.,  Les  valeurs  du  temps  present 
(Paris:  Presse  Universitaire  de  France, 
1983). 

Harding,  S.D.  and  Phillips,  D.R.  with 
Fogarty,  M.  Contrasting  values  in  Western 
Europe:  Unity,  diversity  and  change 
(London:  Macmillan,  1986) 
Abrams  M.,  Gerard  D.,  and  Timms  N. 
(Eds),  Values  and  Social  change  in  Britian 
(London:  Macmillan  1985) 
Rezsohazy  R.  and  Kerkhofs  J.  (Eds), 
L'Univers  des  Beiges:  valeurs  anciennes  et 
valeurs  nouvelles  dans  les  annees  80 
(Louvain-La-Neuve:  CIACO  1984) 
Fogarty  M.,  Ryan  L,  and  Lee  J.,  Irish 
values  and  attitudes  (Dublin:  Dominican 
Press  1984) 

Orizo  F.A.,  Espana,  entre  la  apatia  y  el 
cambio  social  (Madrid:  Editorial 
MAPFRE  1983) 


Winter   1988 


16  - 


lassist   quarterly 


Status: 

Access  C,  Class  11  (SPSS)  Size  L 

Sampling  Procedures: 

Quota  sample:  France.  Italy,  Spain,  West 
Germany  -  quotas  set  by  age,  sex  and 
occupation  on  the  basis  of  census  data. 
Simple  random  sample:  Denmark,  Great 
Britian,  Northern  Ireland,  Netherlands, 
Ireland.    In  each  country  a  random 
selection  of  sampling  points  was  made 
according  to  a  geographical  distribution 
ensuring  that  all  types  of  areas  (rural, 
urban  etc.)  were  represented  according  to 
their  proportion  in  the  population. 
In  addition  to  representative  sampling,  a 
booster  quota  sample  of  200  young  adults 
aged  18-24  was  used  in  each  country. 

Method  of  Data  Collection: 
Oral  Interview 

Population: 

Adults,  18  years  and  over 

Time  dimensions: 

Trend  study:  One  wave 


Date  of  data  Collection: 

March  1981  to 

May  1981 

Cases  (target): 

12000 

Cases  (obtained): 

12463 

Sample  Size  by  Country: 

Belguim 

1145 

Denmark 

1182 

Eire 

1217 

France 

1199 

Grt  Britain 

1231 

Italy 

1348 

Netherlds 

1221 

N.    Ireland 

0312 

Spain 

2303 

W.    Germany 

1305 

Number  of  Variables:     139 

Number  of  Cards  per  Case:      6 

National  samples  weighted  for  age  (NB 
booster  sample).    For  aggregate  European 
totals,  weights  were  applied  to  each 
country  as  follows:  Belgium  3.42. 
Denmark  1.82.  France  17.43.  GB  19.06. 
Italy  19.75.  Netherlands  4.83.  N.    Ireland 
0.49.  Spain  11.60.  Eire  1.03.  W.    Germany 
20.58 

Depositors: 

Moor  R.    A.  de.  European  Value  Systems 
Study  Group 

Principal  Investigators: 

Harding  S.D..  Nene  College.  Dept  of 

Psychology 

Moor  R.A.  de.  University  of  Tilburg 

(Netherlands) 

Kerkhofs  J..  Katholieke  Universiteit 

Leuven  European  Value  Systems  Study 

Group 

Data  Collectors: 

Social  Surveys  (Gallup  Poll)  Ltd 

Dimarso  (Brussels) 

Observa  SA  (Denmark) 

Irish  Marketing  Surveys  (Dublin) 

Faits  et  Opinions  (France) 

Doxa  (Milan) 

NIPO  (Amsterdam) 

Data  SA  (Spain) 

Institut  fuer  Demoskopie 

Allensbach 

Access  to  the  data  and  documentation  is 
available  from  the  Archive  on  a  single-user 
basis  upon  application.    A  short  research 
abstract,  describing  the  intended  use  of  the  data 
is  required  by  the  depositor,  to  whom  each 
individual  application  will  be  sent  for  approval 
of  release.    Data  are  available  in  SPSSx  format. 
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Other  National  Data  Sets 

The  Archive  also  holds  data  from  the  national 

surveys  in  Norway,  Australia  and  Hungary. 
Negotiations  are  underway  for  release  of  other 
national  data  sets  as  well.    Data  from  the  U.S., 
Canada,  Finland  and  Mexico  the  most  likely 
next  arrivals.    We  would  be  most  grateful  for 
information  on  other  national  data.    The 
Norwegian  Social  Science  Data  Services  have 
also  recently  produced  a  teaching  package  based 
on  the  data  called  "Ung  i  Europa"  (Youth  in 
Europe).    Data  from  11  countries  for  about  350 
respondents  per  country  in  the  age  group  18  to 
24  are  included.    Details  from  NSD,  Hans 
Holmboesgt.  22,  5007  Bergen,  Norway." 


Publications 

A  recent  list  of  publications  based  on  the  data, 
or  udlising  the  data,  is  available  from  the 
Archive  upon  application. 
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The  Ups  and 

Downs  of 

Cross-National 

Survey  Research 


by  Tom  W.  Smith' 
NORC= 

University  of  Chicago 


control  for  thcorcticalK  sigiuficaiu 
system-level  differences  thai  can  be 
expressed  as  variables  (Pr/eworski  and 
Teune,  1971,  p.  134). 

In  the  social  sciences'  search  for  lawfulness, 
cross-national  survey  research  is  one  of  the 
most  rewarding  and  punishing  of  pathways. 
When  the  same  models  work  across  nations 
with  the  same  hypotheses  confirmed,  the 
generalizability  of  our  findings  are  greatly 
strengthened  and  our  understanding  of  human 
society  advanced.    When  the  same  models  fail 
to  work  across  nations  and  the  same  hypotheses 
are  not  supported,  we  know  that  some 
additional  variables  must  be  incorporated  into 
our  models.    This  search  for  the  right  additional 
variables  is  often  arduous  and,  as  we  shall  see, 
may  lead  us  through  methodological  bogs,  but 
when  the  proper  specification  or  the  missing 
variables  can  be  discovered  our  understanding  of 
human  society  is  greatly  expanded  (Kohn,  1987; 
Rokkan,  Viet,  Verba,  and  Almsay,  1969;  Niessen 
and  Peschar,  1982;  Armer  and  Marsh,  1982; 
Szalai  and  Petrella,  1977). 


This  research  was  done  for  the  General  Social 
Survey  project  directed  by  James  A.  Davis  and 
Tom  W.  Smith.    The  project  is  funded  by  the 
National  Science  Foundation,  Grant  No. 

SES-8747227. 

What  is  important  in  comparative  research 
is  the  exchange  of  findings-repiicalive 
testing  of  the  same  theories  in  varying 
social  contexts.    When  the  findings  are 
similar,  evidence  accumulates  to  support 
their  generality....    When  findings  are 
different,  we  need  to  explain  these 
differences....    What  we  need  to  do  is 


'Presented  at  the  International  Association  for 
Social  Science  Information  Service  and 
Technoiogv  (I ASSIST)  Conference  held  in 
Washington,  D.C.,  May  26-29,  1988. 

-  GSS  Cross- National  Report  No.  8.  Mav,  1988. 
Revised  December,  1988 


Basic  Approaches 

When  confronted  with  a  cross-national 
difference  between  two  or  more  nations,  there 
are  two  basic  approaches  towards  explaining  the 
difference,  the  idiographic  and  continuum 
(Jasper,  1987).    The  idiographic  approach  looks 
to  unique,  special  case  explanations  for  the 
difference.    Usually  some  particular  historical 
event  or  distinctive  culture  trait  is  offered  to 
explain  the  variation.    This  approach  is 
qualitative  and  tends  to  be  used  in  the 
disciplines  of  history,  anthropology,  and 
personality  psychology.    It  is  also  more  common 
when  only  a  small  number  of  nations  are  being 
compared  and  especially  when  only  two  are 
under  investigation.    An  example  of  this 
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approach  is  the  common  "except  for  the  South" 
caveat  that  is  often  used  to  describe  nineteenth 
century  America,  such  as  "America  was  a  nation 
of  small,  family  farms,  except  for  the  South," 
"America  had  a  two  party  system,  except  for 
the  South,"  or  "Americans  believed  that  'all 
men  are  created  equal,'  except  for  the  South." 
In  each  of  these  cases  the  reason  for  the 
Southern  exception  was  slavery,  which,  as  befits 
an  idiographic  perspective,  was  widely  known  as 
that  "peculiar  institution." 

On  the  other  hand,  the  continuum  approach 
assumes  that  nations  vary  along  various 
underlying  scales  and  that  differences  between 
nations  arise  from  their  different  values  on 
these  variable  scales.    This  approach  is 
quantitative  and  multivariate  and  is  most 
frequendy  employed  in  economics,  sociology, 
and  political  science.    It  tends  to  be  used  when 
a  large  number  of  nadons  are  being  compared. 
An  example  would  be  a  study  that  explained 
differences  in  life  expectancy  across  nations  by 
the  number  of  doctors  and  hospital  beds  per 
capita  and  which  in  turn  explained  disparities  in 
these  medical  resources  by  economic  level  (e.g. 
per  capita  GNP  or  energy  consumption). 

Of  course  these  are  not  mutually  exclusive 
techniques.    A  particular  scholar's  analysis  could 
blend  together  these  two  approaches  or  use 
them  in  combinadon.    For  example,  a  detailed 
historical  and  cultural  understanding  of  the 
nauons  involved  might  well  suggest  variables 
that  could  be  quantified  and  dien  utilized  in  a 
muldvariate  analvsis. 


explains  die  cross-nadonal  variation  and  2) 
fruitfully  combine  qualitadve  and  quanutadve 
techniques  (and  disciplines  such  as  history  and 
economics).    The  dangers  are  diat  1)  we  might 
over  use  particularisuc  explanadons  for  oudiers 
so  diat  we  end  up  accepdng  as  correct  a  basic 
model  that  really  is  misspecified  and  2)  we 
accept  unique,  historical  explanadons  that  by 
their  nature  cannot  be  subjected  to  quanutadve 
verificadon  instead  of  uncovering  more  general 
explanadons  diat  both  could  apply  in  rr.ore 
circumstances  and  would  be  subject  to    .npirical 
verificadon.    For  example,  if  we  were 
conducting  a  study  of  tolerance  of  cultural 
pluralism  and  found  that  Canada  was  more 
tolerant  than  the  general  model  predicted,  we 
could  treat  Canada  as  special  cases  (or  as  a 
dummy  variable  in  the  muldvariate  analysis) 
and  make  reference  to  its  dual  colonial  history 
and  two  charter  groups.    Radier  than  going  to 
this  pardcularisdc  (and  unconfirmable)  approach, 
we  might  decide  that  Canada  was  more  tolerant 
because  of  the  size  of  its  minority  language 
population.    We  could  then  add  a  variable,  % 
minority  language  speakers,  that  might  explain 
why  Canada  was  an  oudier  as  well  as  improve 
the  fit  of  the  model  across  all  countries.    Of 
course  it  might  not  be  readily  possible  to  come 
up  with  general  variables  to  explain  oudiers. 
There  may  be  complicated,  unique  historical 
traditions  in  Canada  that  truly  explain  its  special 
leanings  which  could  not  be  readily  reduced  to 
quantifiable,  general  variables  that  could  be 
coded  for  all  nations. 


Alternatively,  a  residual  or  oudier  analysis  might 
indicate  that  certain  variables  explained  most  of 
die  cross-national  variation,  but  that  one  nation 
or  perhaps  a  group  of  nations  related  in  some 
way  (e.g.  by  a  common  heritage  or  geographic 
proximity)  deviated  from  die  generally  good  fit 
of  the  model.    One  might  have  to  offer 
idiographic  explanations  for  such  oudiers.    This 
residual  approach  has  great  promise  since  it 
allows  us  to  1)  develop  a  model  that  better 


Measurement  Difficulties 

Whether  an  idiographic,  continuum,  or 
combination  approach  is  utilized,  cross-national 
survey  research  offers  great  potential  for  both 
theoretical  refinement  of  our  understanding  of 
human  society  and  a  means  to  test  our  theories. 
But  cross-national  research  must  also  overcome 
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great  barriers  to  achieve  its  potential.    The 
barriers  to  the  successful  completion  of 
cross-national  survey  research  (and  many  other 
related  types  of  cross-national  research)  are  so 
great  that  a  study  of  articles  in  Comparative 
Politics  and  Comparative  Political  Studies  from 
1968  to  1982  found  that  62%  actually  analyzed 
only  one  country  (Jackman,  1985). 

First,  there  are  the  organizational  difficulties. 
The  administration  of  the  cross-national  survey 
efforts  is  always  complex,  involving  the 
coordination  of  principal  investigators,  funding 
sources,  data  collection  and  research  institutions, 
and  perhaps  governments.    Next,  the  cost  is 
high.    Roughly  speaking  the  cost  is  a  multiple 
of  the  number  of  nations  participating.    The 
more  the  number  of  participating  nations,  the 
greater  the  intellectual  potential,  but  the  higher 
the  cost.    In  addition,  the  planning,  execution, 
and  analysis  of  the  research  design  takes  much 
longer  than  any  single  nation  effort    From  my 
experience  with  the  International  Social  Survey 
Program  (ISSP)  (Smith,  1986;  Kuechler,  1987),  I 
would  say  that  it  takes  3-4  times  as  long  to  do 
a  study  from  start  to  finish  across  a  half  dozen 
nations  as  it  does  for  a  similar  study  in  one 
nation. 

Second,  there  are  basic  measurement  issues  that 
make  cross-national  survey  research  extremely 
challenging  (Dogan  and  Pelassy,  1984;  Rokkan, 
Viet,  Verba  and  Almsay,  1969;  Vallier,  1971). 
As  1  once  remarked,  "Equivalency,  the  design  of 
survey  instruments  that  are  efficient,  reliable, 
and  valid  not  only  in  a  single  society,  but  across 
several  nations  is  a  difficult  task."    Most 
broadly,  cultural  differences  between  nations 
may  hinder  equivalency.    Presumably  surveys  of 
dissatisfaction  with  government  would  get 
meaningless  responses  in  repressive  regimes  that 
suppressed  all  dissent    Or  surveys  of  women 
might  not  be  readily  possible  in  fundamentalist 
Islamic  nations,  or  at  least  not  if  using  home 
interviews  by  male  interviewers.    Or  certain 
institutions  may  vary  across  counuies,  making 
comparisons  difficult    For  example,  in  the 


United  States  the  President  is  both  head  of 
government  and  head  of  state,  while  in  most 
European  governments  these  offices  are  held  by 
different  people  (e.g.  in  England  respectively  by 
the  prime  minister  and  the  monarch).    Questions 
about  Reagan  and  Thatcher  usually  ignore  these 
differences  in  their  institutional  roles. 

Of  course  the  cultural  difference  that  most  often 
creates  problems  is  language.    There  are 
well-established  procedures  for  parallel  and  back 
translation  to  insure  an  optimum  linguistic 
match  and  these  must  naturally  be  rigorously 
employed.    These  technique  do  not  insure  that 
true  equivalency  is  achieved  in  survey  questions 
however.    Even  given  the  most  careful  of 
translations  it  is  nearly  pointless  to  compare  any 
two  questions  that  employ  abstract  concepts  and 
subjective  response  categories.    While  it  is 
probably  possible  to  ask  effectively  exact 
equivalents  to  the  questions  "In  what  year  were 
you  born?"  or  "Did  you  vote  in  the  last 
national  election?"    It  is  highly  doubtful  that  the 
responses  to  the  query  "Are  you  very  happy, 
pretty  happy,  or  not  too  happy?"  are 
comparable.    In  all  likelihood  the  closest 
linguistic  equivalent  to  "happy"  will  differ  from 
the  English  concept  in  various  ways,  perhaps 
conveying  different  connotations  and  tapping 
other  dimensions  (e.g.  satisfaction),  but  at  the 
minimum  probably  expressing  a  different  level 
of  intensity  (say  on  an  absolute  bliss  to  sadness 
continuum).    Similarly  the  adjectives  "very", 
"pretty",  and  "not  too"  are  unlikely  to  have 
precise  equivalents.    On  an  absolute  intensity 
scale  running  from  0  to  10,  "very"  might  rank 
an  8  and  pretty  as  a  5  while  the  closest 
counterparts  in  a  second  language  might  be  at 
7.2  and  5.8  or  8.4  and  4.8,  a  difference  that 
would  certainly  either  produce  different 
marginals  or,  perhaps,  similar  marginals  that 
disguised  differences  in  the  absolute  happiness 
distributions. 
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Methodological  Solutions 

At  least  four  solutions  have  been  offered  and 
each  is  certainly  worth  exploring.    The  first 
solution  is  to  use  numerical  scales  (e.g. 
ratio-level,  magnitude  measurement  scales  or 
thermometers).    Numerical  scales  are  believed  to 
reduce  problems  by  providing  a  universally 
understood  set  of  catagories  that  have  precise 
and  similar  meanings  (e.g.  1,2,3  or  2:1,  3:1  have 
cross-linguistic  equivalents)  and  that  there  is  no 
need  to  come  up  with  language  labels  to  try 
and  denote  the  intensity  of  each  category.    The 
problems  are  1)  that  many  (but  not  all)  of  these 
scales  are  more  complex  than  simpler  verbal 
items  and  that  this  may  actually  increase  the 
non-comparability  problem,  2)  that  such  an 
approach  does  not  address  the  problems  of 
variation  in  the  meaning  and  strength  of  the 
basic  abstraction  involved,  and  3)  that  different 
cultures  may  vary  in  their  understanding  of  and 
ways  of  responding  to  numerical  scales. 

Another  possibility,  in  a  sense  the  opposite  of 
the  numerical  approach,  is  the 
"keep-it-simple-stupid"  approach.    For  surveys 
this  would  mean  to  use  only  dichotomies. 
Presumably  (but  it  could  certainly  use  empirical 
verification)  yes/no,  favor/oppose,  etc.  have 
similar  cutting  points  across  languages  (or  at 
least  that  there  are  equivalent  pairs,  even  if 
these  might  not  be  the  optimum  examples). 
The  argument  is  that  it  may  be  difficult  to 
determine,  because  of  language,  where  someone 
sits  on  a  continuum,  but  comparatively  easy  (in 
the  aggregate)  to  determine  on  which  side  of  a 
tipping  point  people  are.    But  this  approach  also 
begs  the  question  whether  the  underlying 
concept  is  being  understood  in  a  similar  enough 
fashion  in  both  languages  and  results  in  less 
information  than  a  question  with  say  seven 
categories.    At  least  in  terms  of  crude  number 
of  response  categories  one  would  have  to  ask 
three  to  six  dichotomies  (depending  on  the 
handling  of  DKs)  to  get  as  many  response 


categories  distinguished  as  one  seven-point 
question. 

The  third  approach  is  to  attempt  to  calibrate 
the  scales  by  determining  the  strength  of  the 
verbal  labels  used.    This  would  permit  the 
adoption  of  labels  for  the  happiness  question 
that  cut  the  same  points  of  the  underlying 
bliss-sadness  continuum.    Similarly  by  both 
linguistic  analysis  and  numerical  evaluation  it 
would  be  possible  to  determine  whether 
happiness  was  an  equivalent  stimulus  in  two 
languages/cultures.    Ideally,  it  might  be  possible 
to  determine  that  certain  verbalizations  and 
scales  were  similar  between  languages  and  that 
such  equivalency  held  across  their  use  in  diverse 
questions.    As  such  it  might  be  possible  to 
develop  a  standard  set  of  response  categories 
that  could  be  used  across  many  different 
questions.    This  approach  necessitates  extensive 
prior  work  of  the  meaning  of  words  and  the 
strength  of  adjective  labels.    It  also  typically 
assumes  that  modifiers  will  have  similar  strength 
across  a  wide  range  of  applications  (i.e.  in  all 
questions). 

The  final  approach  is  to  adopt  multiple 
indicators.    This  approach  differs  in  one 
significant  way  from  the  typical  psychometric 
scale  approach  in  that  not  only  would  multiple 
questions  be  employed,  but  different  response 
scales  would  be  used.    Consider  the  following 
scheme  to  assess  whether  the  French  or  English 
are  higher  on  psychological  well-being. 

A.  General  happiness 

B.  Overall  satisfaction 

C.  Domain-specific  satisfaction 

D.  Bradbum's  affect-balance  scale 


While  an  Anglo-French  comparison  on  any  one 
of  these  question  scales  would  be  suspect 
because  of  the  language  ambiguities  outlined 
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above,  it  is  possible  to  get  unambigious  results 
from  the  above  if  the  results  are  consistent  (i.e. 
the  French  leading  or  trailing  the  British  on  all 
measures).    Of  course,  if  all  the  results  are 
consistent  then  it  was  not  necessary  (except  for 
the  great  gain  in  precision  and  richness)  to  have 
used  more  than  one  indicator.    However,  the 
difference  is  that  with  one  indicator  you  never 
know  whether  the  observed  differences  are 
social  or  only  linguistic.    What  is  needed  is  at 
least  three  indicators  of  the  same  construct.    If 
one  has  two  and  they  disagree  it  might  be  that 
one  is  "real"  and  the  other  only  linguistic. 
With  three  you  have  a  "tie  breaker"  (Think 
how  this  phrase  might  translate  into  another 
language.    It  might  come  out  as  an  abuser  of 
neckties  or  an  under  of  knots.    Of  course  good 
translation  and  especially  back  translation  will 
avoid  these  pitfalls.)  in  which  you  can  hopefully 
isolate  the  odd-linguistic  case  from  the  two 
consistent  social  differences.    Of  course  life  need 
not  be  so  simple.    Few  results  would  be  purely 
"linguistic"  or  "social".    Questions  may  differ 
because  they  fail  to  tap  the  same  dimension  or 
may  be  tapping  a  single  construct  in  one 
language,  but  two  distinct  constructs  in  a  second 
language,  or  because  of  other  problems. 
However  we  feel  that  three  carefully  translated 
question,  pretested  as  reliable  in  each  individual 
language,  and  guided  by  clear  and  reasonably 
sound  social  science  theory  should  allow  us  to 
avoid  most  of  the  larger  problems,  most  of  the 
time  (but  not  all  problems  all  of  the  time). 

Of  course  these  techniques  are  not  mutually 
exclusive,  but  rather  directly  complementary. 
Adopting  multiple  indicators  is  the  first  step, 
but  in  the  design  of  these  multiple  indicators  it 
would  be  quite  sensible  to  use  a  numerical  scale 
for  some  indicators  and  of  course  extremely 
useful  to  learn  more  about  differences  in 
concept  words  and  the  intensity  of  modifiers  via 
open-minded  questions  and  non-survey  textual 
analysis  in  the  first  case  and  via  scaling  and 
relating  techniques  in  the  later  case. 


Even  given  the  best  precautions  measurement 
artifacts  can  slip  through.    As  a  result  we  must 
be  prepared  to  detect  such  problems  at  the 
analysis  stage.    Basically  what  we  can  do  is 
inspect  the  data  for  anomalies  such  as  high 
levels  of  Don't  Knows  which  may  indicate  a 
lack  of  understanding  or  relevance  in  a 
particular  country  and  unusual  marginals  or 
relationships  which  may  indicate  not  true 
differences,  but  only  measurement  artifacts. 
When  suspicious  signs  are  found  we  must 
review  the  questions  wordings,  data  codings,  etc. 
to  see  if  there  are  any  signs  of  unintended 
differences.    For  example,  in  the  1987  ISSP 
survey  on  social  inequality  one  item  asked 
people  to  rate  their  social  standing  by  placing 
themselves  on  a  10-poini  ladder.    The  Dutch 
data  showed  many  more  people  in  the  bottom 
three  rungs  than  any  other  nation.    Inspection 
of  their  instrument  indicated  no  apparent 
problem  with  the  wording,  but  their  ladder 
widened  at  the  bottom,  obviously  suggesting  to 
respondents  that  more  people  were  in  the  lower 
rungs  than  the  middle  and  top  (Smith,  1988). 
While  one  will  occasionally  find  such  a  smoking 
gun,  often  no  obvious  problems  will  be 
discovered. 

That  leaves  the  invesugator  to  ponder,  is  it  real 
or  is  it  artifact?    Assuming  that  neither  the 
triangulation  procedure  described  above,  the 
methods  review,  nor  any  other  internal  analysis 
clarifies  the  issue,  one  is  left  to  apply  certain 
external  tests.    Perhaps  first  and  foremost  is  to 
ask  whether  the  difference  confirms  to  our  a 
priori  theory.    Second,  if  no  prior  theory  had 
been  formed  about  the  particular  difference 
observed,  one  might  examine  other  research 
including  histories  and  single  nation  studies  to 
see  if  the  difference  might  be  consistent  with 
known  attributes  of  the  country.    Third,  in 
special  cases  one  can  eliminate  language  as  a 
probable  factor  if  the  difference  appears  in 
some  countries  using  the  same  language  and  not 
in  others.    For  example,  the  US  and  to  a  lesser 
extent  Australia  are  less  supportive  of  the 
welfare  state  than  are  Western  European 
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nations.    Since  Great  Britain  is  English-speaking, 
but  resembles  its  continental  brothers  more  than 
its  colonial  children,  we  can  safely  conclude  that 
language  is  not  the  explanation  for  the 
cross-national  differences  (Smith,  1987;  1988). 
In  another  special  case  we  can  look  at 
multi-lingual  countries  (e.g.    Canada, 
Switzerland,  Belgium)  to  see  if  a  national 
difference  is  really  due  to  country  and  not 
language  (Inglehart  and  Rabier,  1985).    For 
e.xample,  if  the  French  speaking  Walloons  in 
Belgium  are  less  happy  than  the  French,  while 
the  Dutch  speaking  Flems  are  less  happy  than 
people  in  the  Netherlands,  then  we  can  be 
reasonably  sure  that  there  is  something  about 
being  Belgian  that  makes  people  less  happy 
than  being  French  or  Dutch.    Of  course  the 
pattern  need  not  always  be  so  clear  cuL 
Finally,  in  some  circumstances  the  best  thing 
may  be  to  try  to  replicate  the  result,  perhaps  by 
developing  alternative  indicators  that  should 
show  the  same  difference  if  the  pattern  is  real 
radier  than  artifactual. 


Summary 

Overall,  the  "downs"  of  cross- national  survey 
research  are  neither  trivial  nor  intractable.    They 
are  inherent  and  unadvoidable  problems  that 
must  and  can  be  copted  with.    They  can  be 
minimized,  but  not  eliminated.    Despite  these 
problems,  the  "ups"  of  cross-national  survey 
research  are  well-worth  the  effort    Profound 
insights  into  the  human  condition  are  more 
likely  to  emerge  from  such  comparative 
perspectives  than  from  alternative  approaches. 
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Introduction 

The  occurrence  of  errors  in  data  is  one  of  the  most  serious  problems  affecting  information 
management  and  utilization.    As  a  direct  result,  the  identification  and  resolution  of  data  enors  is 
critical  to  any  information  management  system.    All  data  acquisition  and  management  efforts  utilize 
some  form  of  error  detection  or  data  editing  prior  to  loading  or  updating  information  into  a  data 
base  management  or  archival  system.    Although  the  specific  nature  of  the  data  and  data  structures 
may  differ  and  the  required  precision  and  accuracy  may  vary,  the  occurrence  of  errors  and  systems 
to  control  errors  seems  constant.    Approaches  and  techniques  to  identify  and  resolve  data  errors  have 
general  applicability  independent  of  specific  data  content  areas. 

The  purpose  of  this  paper  is  to  discuss  the  use  of  SAS  to  support  a  general  data  quality  assurance 
and  control  system  within  a  large,  complex  data  acquisition  and  management  effort    The  specific 
application  is  in  the  area  of  analytical  environmental  data  or  data  produced  when  environmental 
samples  are  analyzed  in  laboratories  to  determine  the  occurrence  and  concentration  of  priority 
pollutants  and  hazardous  substances.    The  nature  of  data  quality  assurance  and  control  and  the  nature 
of  the  data  and  application  are  briefiy  introduced.    The  rational  and  effectiveness  of  using  SAS  is 
then  presented. 


'Presented  at  the  International  Association  for  Social  Science  Information  Service  and  Technology 
(lASSIST)  Conference  held  in  Washington,  D.C.,  U.S.A.  on  May  26-29,  1988 
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Finally,  the  application  of  specific  SAS  procedures  within  each  major  data  quality  assurance  and 
control  activity  is  discussed  and  examples  given.    Primary  SAS  features  discussed  include  format 
tables,  multiple  record  keys,  variable  length  record  emulation,  macros,  conditional  output,  temporary 
work  files  and  automatic  character/numeric  conversion.    Each  of  these  capabilities  are  used  for  this 
particular  environmental  data  application,  but  have  general  applicability  to  any  data  error 
identification  and  resolution  problem. 


Nature  of  Data  Quality  Assurance 

In  large  scale  data  acquisition  and  management  systems,  data  validation,  verification,  or  quality 
control  editing  becomes  so  important  that  a  separate  operation  known  as  data  quality  assurance  and 
control  is  usually  created.    The  process  is  also  known  as  data  editing  or  cleaning.    The  primary  focus 
of  such  a  system  is  on  the  identification  and  resolution  of  data  errors.    Such  a  process  insures  that 
data  is  of  a  known  quality  and  within  specific  data  quality  control  limits. 

Three  interrelated  activities  are  involved  in  data  quality  control  or  editing: 

•  Limit  specification:  developing  and  stating  quality  control  limits  or  specifications  for  defining  data 
errors.    Errors  are  defined  by  users  of  data  and  are  not  an  inherent  property  of  data 

•  Error  identification:  identifying  and  reporting  samples  or  data  elements  within  samples  which 
violate  established  limits 

•  Error  resolution:  developing  methods  and  techniques  to  resolve  issues  of  data  integrity  resulting 
from  the  identification  of  a  value  or  values  as  being  inconsistent  with  or  outside  tJie  limits. 

Each  of  these  three  activities  may  be  conducted  as  a  manual  or  computer  assisted  process  depending 
on  the  format  of  the  data  being  evaluated  and  designed  according  to  theoretical  or  empirically  based 
information. 


Environmental  Data  Structures 

The  basis  for  conducting  most  environmental  assessment,  monitoring,  or  remedial  actions  are  the 
results  from  the  laboratory  analysis  of  soil  and  water  samples.    Each  sample  is  collected  and  screened 
for  between  25  to  150  priority  pollutants  and  hazardous  substances.    Actions  are  planned  according  to 
the  compounds  identified  and  the  concentration  levels  determined. 
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For  this  particular  environmenial  program,  over  80,000  samples  per  year  are  analyzed.    The  number 
of  data  elements  per  sample  or  analysis  may  total  over  2,000  depending  on  the  method  of  analysis 
and  the  laboratory  quality  control  specifications.    The  volume  of  information  being  checked  for  a 
single  group  of  samples  is  also  enormous.    Data  for  only  50  samples  involves  approximately  900,000 
bytes  and  over  10,000  records.    Many  different  units  of  collection  and  analysis  occur  and  include 
sampling  episode  or  group,  sample,  method,  instrument,  and  compound.    Linkage  and  data  association 
may  involve  6-7  identifiers  or  keys.    Data  types  and  range  variability  is  both  numeric  and  alphabetic 
with  values  from  10-7  to  106  including  missing  data  problems. 

The  applications  area  for  this  particular  data  quality  assurance  and  control  system  is  in  the 
Environmental  Protection  Agency's  Analytical  Operations  Branch  within  the  Superfund  Program  and 
the  Office  of  Emergency  and  Remedial  Response. 


SAS  as  a  Generalized  Data  Quality  Assurance  Support  System 

The  system  discussed  in  this  paper  runs  on  an  IBM  3090  mainframe  and  consists  of  2  program 
libraries,  2  table  libraries,  and  3  SAS  databases  (containmg  over  20  individual  SAS  files)  which  reside 
on  both  disk  and  tape.    The  data  checking  portion  of  the  system  resides  in  12  program  modules 
which  total  over  15,000  lines  of  code.    The  entire  system  is  supported  by  25  programs  totaling  over 
35,000  lines  of  code.    This  application  is  large  and  was  designed  to  run  on  a  mainframe.    It 
processes,  on  the  average,  12  million  bytes  of  information  per  day.    Because  of  the  modular  design  of 
this  system,  certain  portions  can  be  easily  modified  to  run  at  the  PC  level  if  needed. 

The  utilization  of  SAS  in  developing  this  data  quality  assurance  system  has  several  advantages  over 
using  a  procecural  language.    If  procedural  languages  are  used  to  design  data  quality  control  systems 
in  scientific  areas,  several  problems  can  occur: 

•  Initial  development,  frequent  modification  and  necessary  documentation  by  professional 
programmers  represents  considerable  time  and  resource  expenditures. 

•  Users,  or  knowledgeable  persons  cannot  make  changes  directly.    Programmers  do  not  understand 
the  substance  and  meaning  of  the  data,  so  lengthy  requirements  analysis  discussions  must  be 
conducted  in  a  third  language. 

•  The  need  for  formalized  requirements  analysis  and  system  documentation  again  impacts  time  and 
resources  and  diverts  the  data  users  time  from  research,  analysis  and  problem  solving  to  problem 
representation  and  monitoring. 

•  Techniques  and  approach  are  not  directly  generalizable  to  new  applications. 

The  desired  objectives  for  a  data  quality  assurance  and  control  support  system  language  in  scientific 
areas  include: 
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•  A  user  oriented  non-procedural  language 

•  Ease  in  implementation,  extension  and  modification 

•  Availability  of  statistical  procedures  and  functions 

•  Capabilities  for  dealing  with  missing  data  and  scientific  notation 

•  Facility  for  supporting  effectively  within  the  same  system  users  of  different  expertise  levels 

•  Generalizability  to  other  data  and  applications. 

The  following  section  discusses  and  demonstrates  SAS  features  and  capabilities  which  are  used  for 
this  particular  application.    These  features  and  the  general  approach  can  be  extended  to  a  variety  of 
data  validation  and  quality  and  quality  control  efforts.    SAS  features  and  examples  of  their 
application  to  the  data,  are  presented  according  to  each  of  the  major  areas  of  data  quality  assurance: 
designing  QC  limits,  error  identification  and  error  resolution. 


SAS  Features  Used  in  Data  Quality  Control 

For  each  of  the  major  data  quality  assurance  and  control  areas,  the  nature  of  the  required  activities, 
functions  or  problems  is  presented.    Within  each  activity  or  problem,  the  SAS  features  that  are 
important  to  this  application  are  then  discussed  and  specific  examples  presented.    Table  1,  "SAS 
Features  Used  by  QA/QC  Activity."  summarizes  the  SAS  features  by  activity  performed  for  each  of 
the  three  applications  areas:  designing  limits,  error  identification,  and  error  resolution. 

Designing  Quality  Control  Limits 

Data  quality  control  limits  or  specifications  usually  include  one  range  check  for  each  data  element  or 
variable  and  as  many  logic  checks  as  necessary  to  evaluate  all  relationships  of  interest.    A  range 
check  defines  the  expected  or  "allowable"  codes  or  values  for  a  given  data  element.    A  logic 
specification  defines  the  expected  logical  relationship  between  two  or  more  data  elements.    Logic 
checks  usually  involve  issues  of  discrepancy,  consistency  or  invalid  relationships  between  data  items. 

Within  the  range  and  logic  classification,  two  other  distinctions  can  be  made.    Specifications  can  be 
delineated  according  to  the  underiying  justification  or  basis  for  the  check:  syntax  (structure),  semantic 
(meaning)  and  statistical  (distributional  properties).    Specifications  can  also  be  differentiated  on  the 
basis  of  whether  the  checks  occur  within  a  single  logical  file  or  across  multiple  files.    The 
longitudinal  checking  of  data  or  the  comparison  of  different  data  sources  are  examples  of  multiple 
file  or  cross  file  checking. 
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Table  1 


TABLE  I 
FEATURES  USED  BY  QA/QC  ACTIVITY 


SAS 

QA/QC  ACTIVITIES 

FEATURE 

DESIGNING 
QC  LIMITS 

ERROR 
IDENTIFICATION 

ERROR 
RESOLUTION 

FORMAT  TABLES 

Storing  Data 
Specifications 

Retrieving  Data 
Specifications 

Retrieving 
Error  Message 

MULTIPLE  FILE 
KEYS 

Data  Linkage 

Data  Linkage 

VARIABLE  LENGTH 
RECORD  EMULATION 

Efficient  Space 
Utilization 

SORT  AND  MERGE 

File 
Manipulation 

File 
Manipulation 

File 
Manipulation 

FUNCTIONS 

Field 
Manipulation 

Field 
Manipulation 

MACROS,  %INCLUDE, 
LINK,  ETC. 

Processing 
Control 

CONDITIONAL 
OUTPUT 

File 
Manipulation 

File 
Manipulation 

File 
Manipulation 

TEMPORARY 
WORK  FILES 

File 
Manipulation 

File 
Manipulation 

File 
Manipulation 

SAS  REPORT 
FEATURE 

Error 
Reporting 

MISSING  DATA 
FEATURE 

Data 
Editing 

AUTOMATIC 
CHAR.-NUM./ 
NUM. -CHAR.  CONV. 

Data 
Editing 

Data 
Editing 

UPDATE 
PROCEDURE 

Correcting 
Errors  on 
Data 

FULL  SCREEN 
FILE  EDITING 

Correcting 
Errors  on 
Error  Record 
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If  quality  control  is  conducted  by  another  person  or  group  other  than  the  end  user,  the  customary 
procedure  is  for  the  person  or  group  to  create  all  implied  range  and  logic  syntax  checks.    Certain 
obvious  semantic  checks  might  also  be  proposed.    The  quality  control  specifications  are  then  reviewed 
and  confirmed  by  the  user  prior  to  implementation.    Additional  semantic  and  statistical  checks  are 
contributed  by  the  data  user  to  the  general  set  and  complete  the  logical  specifications. 

The  major  component  of  designing  quality  control  specifications  include: 

•  Defining  representing  and  storing  limits  and  relationships 

•  Linking  files  and  data  elements  and 

•  Storing  and  reporting  results. 

The  use  of  SAS  features  in  performing  each  function  is  discussed  and  examples  given.    Primary  SAS 
features  include:  format  tables,  multiple  file  keys,  variable  length  record  emulation,  functions, 
conditional  output,  temporary  work  files  and  automatic  character/numerical  conversion. 

Defining,  Representing,  and  Storing  Variable  Limits 

Defining  limits  and  relationships  of  variables  requires  each  variable  to  be  examined  to  determine  its 
appropriate  value  or  range  of  values.    It  is  also  to  necessary  to  determine  values  depending  on  some 
other  data  qualifier.    For  example  the  valid  values  for  variable  A  is  '0'  or  T'  -  these  are  the  only 
two  correct  entries  which  variable  A  can  contain.    Variable  B,  however,  contains  conditional  values:  if 
variable  A  is  '0'  then  variable  B  may  contain  either  'A'  or  "B',  if  variable  A  is  '1'  then  variable  B 
may  contain  either  'C,  'D',  or  'E'. 

All  variable  comparisons  are  written: 

VARl  =  VARA  OR  VARl  =  VARB  THEN... 

Conditional  edits  must  be  coded: 

VARl  =  '0'  THEN  DO... 
IF  VARl  =  VARA  OR  VARl  =  VARB  THEN... 

END 

Complex  conditional  statements  similar  to  the  following  are  not  uncommon. 

IF  Al  =  'EVALA'  AND  A2  =  'EVALB'  AND  A3  =  'EVAI.C  THEN  \X)... 
IF  A4  =  'INDA'  AND  A5  =  'INBA'  AND  A6  =  'TOXAPH'  THFN  ... 
IF  A7  =  'AR1016'  AND  A8  =  •AR1242'  AND  A9  =  •AR1248'  THEN  ... 

END 
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Problems  are  encountered  when  trying  to  represent  and  store  these  multiple  values/limits  and 
conditionally  determine  the  validity  of  variables.    Dozens  of  possible  values  may  need  to  be  stored 
and  extracted  for  each  variable.    Additional  problems  are  encountered  when  trying  to  develop  the 
logic  and  software  needed  to  perform  the  edits.    The  main  SAS  feature  used  in  representing  and 
storing  QC  limits  is  the  use  of  SAS  format  tables. 

SAS  format  tables  allow  convenient  storage  and  retrieval  of  data  specifications.    Values  can  be 
extracted  from  a  SAS  table  and  compared  to  data  values  for  a  variable.    The  use  of  SAS  tables 
allows  storage  of  specifications  in  one  place  that  are  necessary  to  many  programs.    Format  tables 
facilitate  organizing  and  updating  specifications,  to  update  a  specification,  one  simply  changes  the 
table  and  reloads  the  format  library.    This  change  is  immediately  picked  up  by  all  programs  using 
this  table. 

In  the  current  application,  edit  specifications  (i.e.  values,  limits,  and  ranges)  are  stored  in  SAS  format 
tables.    These  tables  are  used  to  structure  and  organize  the  edit  specifications.    The  table  consists  of 
a  'key'  lookup  field  followed  by  one  or  more  values.    For  example,  given  the  following  data 
specifications  for  four  chemical  compounds: 

Variable  1    Variable  2        Variable  3 
Range  Limit  Valid  Codes 

Compound  1  (code=110)  10-330  19800  M 

Compound  2  (code=120)  10-340  19810  LF 

Compound  3  (code=130)  50-340  19800  ML 

Compound  4  (code=140)  10-300  19800  MLE 

These  specifications,  after  being  organized  and  tabulated,  can  easily  be  transformed  into  a  SAS  format 
table.    The  following  is  an  example  of  the  above  data  specifications  converted  to  a  table  which  is 
ready  to  be  loaded  into  a  SAS  format  table. 


(1) 

(2) 

(3) 

(4)         (5)  ... 

no 

10 

330 

19800      M 

120 

10 

340 

19810      LF 

130 

50 

340 

19800      ML 

140 

10 

300 

19800      MLE 

This  example  shows  four  data  specifications  for  each  of  the  four  compounds.    Column  (1)  is  the  key, 
or  lookup  code,  for  the  compound,  followed  by  four  columns,  each  column  being  a  limit  or  valid 
code  for  various  fields  associated  with  this  compound.    Column  (2)  is  the  lower  limit  for  variable  1, 
column  (3)  is  the  upper  limit  for  the  same  variable,  column  (4)  is  the  upper  limit  for  variable  2 
(note  that  in  the  specifications,  it  is  necessary  to  check  the  upper  limit  only  of  variable  2  ),  and 
column  5  contains  the  valid  codes  which  variable  3  may  contain. 

This  file  is  easy  to  build  and  maintain  -  it  is  simply  an  80  byte  text  file  which  can  be  created  and 
maintained  by  any  text  editor.    This  file  can  then  be  loaded  into  SAS  table  format  (by  using  the 
SAS  format  procedure)  where  it  is  accessible  to  all  programs. 
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Multiple  tables  for  various  edits  and  reports  may  be  stored  in  one  Tile.    In  this  application  all  format 
tables  are  stored  in  one  text  file.    By  adding  a  'table  code'  to  the  beginning  of  each  line  an 
unlimited  number  of  tables  can  be  stored  in  one  file.    The  'table  code'  signals  the  SAS  program 
which  loads  the  text  tables  into  SAS  format  tables  to  write  each  group  of  'table  codes'  to  a  different 
format  table.    Keeping  all  tables  for  a  system  in  one  place  can  make  maintenance  and  data 
specification  modifications  much  easier.    An  example  of  how  this  text  file  is  structured  is  shown  in 
Appendix  A. 

The  Appendix  A  example  consists  of  several  different  types  of  tables.    Table  #1  stores  error  codes 
and  error  text  messages,  Tables  #2  and  #3  store  data  QC  limits,  Table  #4  stores  chemical 
compound  names  used  in  formatting  reports,  and  Table  #5  stores  more  data  QC  limits.    Notice  that 
each  table  consists  of  the  same  components:  a  table  code,  a  look-up  code  or  key,  followed  by  a  text 
field  which  may  consist  of  one  or  more  character  or  numeric  fields  (up  to  40  bytes  long).    Appendix 
B  is  an  example  of  the  SAS  program  which  reads  the  text  file  (Appendix  A)  and  creates  a  SAS 
format  table  for  each  text  table. 

The  program  in  Appendix  B  builds  the  SAS  code  necessary  to  load  each  individual  text  table  into  a 
SAS  format  table.    The  benefits  of  using  this  method  of  loading  tables  is  the  ease  with  which 
updates  to  the  data  specifications  can  be  made  -  it  also  allows  the  user  to  control  the  maintenance 
of  the  data  specifications.    To  change  edit  specifications,  the  user  simply  edits  the  text  table  file 
(Appendix  A)  and  submits  the  code  in  Appendix  B  to  execute  in  a  batch  job.    The  changes  will  then 
be  represented  in  all  programs  accessing  these  tables. 

Unking  Data  Elements,  Files  and  Specifications 

Related  data  residing  on  various  files  must  be  linked  together  by  key  fields  before  it  can  be 
compared  and  edited.    Problems  are  encountered  when  multiple  files  are  involved,  each  requiring 
linkage  to  another  file  with  a  different  set  of  key  variables.    After  file  linkage  is  completed,  the  data 
specifications  must  be  linked  into  the  correct  observations  for  comparison  and  data  editing.    Here  the 
database  management  capabilities  of  SAS  become  important 

The  SAS  features  used  in  linking  data  files,  records,  data  elements  and  specifications  include: 

•  Multiple  key  fields  - 

SAS  allows  any  variable  on  a  SAS  file  to  be  used  as  a  key  field.    Any  field  which  is  common  to 
two  or  more  files  can  be  used  to  link  and  join  observations  on  these  files. 

•  SAS  sort  and  merge  procedures  - 

The  use  of  the  SAS  sort  and  merge  procedures  facilitates  the  linking  of  observations  from  various 
files. 

•  SAS  files  -  variable  length  record  emulation  - 

By  storing  sets  of  data  in  multiple  files  and  then  linking  these  files  as  needed,  variable  length 
record  data  storage  can  be  emulated  in  SAS.    Because  all  observations  in  a  given  SAS  file  are  the 
same  length  and  inefficient  as  far  as  space  utilization  is  concerned.    Redundant  fields  can  be 
summarized  and  stored  on  a  separate  file  to  save  space. 
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•    Temporary  work  files  - 

Linkages  between  observations  can  be  made  at  the  time  of  the  edits  in  temporary  work  files. 
After  data  is  extracted  from  the  permanent  files  it  is  then  linked  as  required  for  the  particular 
edits  to  be  performed.    Once  the  edits  are  performed,  the  temporary  file  containing  the  linked 
data  is  erased  -  this  ensures  that  workspace  will  be  available  for  later  use. 

The  current  application  consists  of  a  large  collection  of  variable  length  data  records  linked  together 
by  various  keys.    That  is,  separate  data  files  are  used  to  partition  the  redundant  sections  of  records. 
Through  careful  file  design,  the  large  and  inefficient  data  structures  are  efficiently  stored  and 
accessed  in  SAS  files. 

File  design  is  very  important    SAS  files  can  be  inefficient  if  not  designed  carefully  with  the 
applications  in  mind.    Large  amounts  of  data  may  become  difficult  and  costly  to  maintain.    Efficient 
storage  can  be  obtained  through  variable  length  record  emulation.    Many  of  the  data  forms  received 
from  the  labs  contain  multiple  observations  with  the  same  value  for  a  specific  field.    For  example, 
header  information  applies  to  many  detail  records.    Redundant  fields  can  be  extracted  and  stored  just 
once  in  a  separate  file  and  merged  back  when  needed  using  the  proper  keys.    This  technique 
considerably  reduces  storage  space  requirements  and  processing  the  data  for  specific  applications  is 
more  efficient    There  are  some  drawbacks  in  certain  data  applications  -  these  drawbacks  include  the 
need  for  additional  logical  data  manipulation  operations. 

An  example  of  this  variable  length  record  emulation  in  SAS  files  is  shown  by  the  design  of  the  SAS 
files  for  'Form  I'.    Data  received  from  the  laboratories  on  'Form  I'  actually  consists  of  3  types  of 
data  (see  example  of  Form  I  in  Appendix  C).    The  least  efficient  (but  simple)  SAS  representation  of 
this  file  would  be  one  SAS  observation  per  page  of  the  form.    Keeping  in  mind  how  the  data  will 
be  processed  and  more  importantly  storage  efficiency,  the  data  may  be  partitioned  into  header 
information  and  compound  results.    Further  analysis  of  the  forms  shows  that  much  of  the  header 
information  on  each  page  is  repetitive  and  can  be  summarized  and  stored  in  a  separate  file  partition. 
Ultimately  four  files  resulted  from  the  partitioning  of  'Form  I':  Header  Information,  Results,  TIC 
Results,  and  SDG  Information  (see  examples  of  these  files  in  Appendix  D).    These  four  files  are 
processed  independently  or  merged  together  to  create  the  original  form  format    The  impact  on 
storage  efficiency  is  dramatic  compared  to  the  single  record  case.    Storage  requirements  are  reduced 
by  a  factor  of  four. 

Extensive  editing  and  data  checking  requirements  require  that  files  be  built  with  multiple  linkages. 
Each  observation  on  each  file  has  explicit  linkages  to  observations  on  every  other  file  to  which  it 
must  be  linked.    Key  fields  which  link  units  of  data  differ  depending  on  the  units  being  linked. 
This  can  be  demonstrated  in  the  four  'Form  I'  files  mentioned  above  (see  Appendix  D). 

The  basic  unit  of  analysis  for  this  application  is  the  sample  delivery  group  (SEKj).    During  the 
editing  process,  all  data  for  a  given  SDG  (SDG  is  the  highest  level  key  on  all  files)  is  extracted 
from  the  master  files  and  placed  in  temporary  work  files.    As  shown  in  the  example  files  in 
Appendix  D,  each  file  contains  the  master  SDG  key  (SDG_NO).    To  link  files  1,  2  and  3  together 
(after  extraction  by  SDG)  is  a  simple  merge  by  the  variable  'SAMPLE'.    Variables  from  file  4  (SDG 
information)  are  merged  in  by  the  variable  'SDG_NO'.    Other  data  files,  however,  do  not  contain 
'SAMPLE'  and  must  be  linked  by  additional  variables.    For  example,  file  5  (instrument  tuning 
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information)  is  linked  to  file  1  by  'SAMPLE'.    File  6  (instrument  calibration  information)  is  linked  to 
file  5  by  instrument  identifier  (IN_IDF5,  IN_IDF6).    The  only  way  to  link  file  1  to  file  6  is  by 
using  file  5  as  an  intermediate  link.    One  file  6  observation  applies  to  many  file  1  observations,  in 
other  words  one  'tune'  applies  to  many  'samples'.    This  example  begins  to  show  some  of  the 
complexity  of  the  data. 

Observations  in  each  file  for  each  SDG  must  be  checked  against  corresponding  fields  on  other  files. 
In  some  cases  a  single  observation  on  one  file  may  apply  to  multiple  observations  on  another  file. 
Using  SAS  sorts  and  merges,  each  unit  of  data  can  be  temporarily  linked  to  its  corresponding  data 
on  any  other  file,  the  edits  applied,  and  error  records  generated.    Storing  and  Reporting  Results 

Reporting  and  storing  results  from  complex  multi-leveled  edits  can  be  very  complicated.    Multiple 
error  records  may  be  generated  from  a  single  data  problem  and  determining  which  piece  of  data 
generated  the  error  can  be  difficult    The  error  record  must  contain  all  information  necessary  to  fully 
explain  the  nature  of  the  error  and  indicate  the  location  of  the  observation  on  which  the  error 
resides.    Error  reports  must  be  written  to  summarize  the  errors  in  an  convenient  and  usable  form 
which  can  be  understood  by  the  user. 

The  major  SAS  features  used  in  results  reporting  include: 

•  SAS  format  tables  - 

SAS  format  tables  are  useful  for  storing  error  message  text  information. 

•  Data  manipulation  - 

The  ease  with  which  files  and  observations  can  be  handled  in  SAS  allows  various  error  reports 
and  summaries  to  be  produced  from  the  error  records. 

•  SAS  functions  - 

SAS  functions  allow  the  manipulation  of  results  for  reporting  purposes. 

•  Conditional  outputs  - 

Conditional  output  statements  aid  in  manipulating  data  and  files. 

In  the  current  application,  as  each  error  or  discrepant  data  field  is  encountered  an  error  record  is 
generated.    This  error  record  contains  all  necessary  information  to  fully  document  the  nature  of  the 
error.    This  includes  the  variable  which  caused  the  error,  the  related  variable  from  which  the 
comparison  was  made,  the  location  of  the  observation  on  which  the  error  resides,  the  date  and  time 
at  which  the  edits  were  executed  and  the  value  which  the  field  should  be  (if  it  is  possible  to 
calculate).    This  information  allows  detailed  error  reports  to  be  produced  and  also  allows  the 
discrepant  record  to  be  located  and  corrected  if  desired. 

Since  many  error  records  can  be  produced,  the  error  file  needs  to  use  space  as  efficiently  as  possible. 
Multiple  error  messages  may  need  to  be  created  to  fully  explain  a  data  problem.    As  the  data  is 
checked,  each  time  an  error  is  encountered  an  error  record  is  output    A  discrepant  variable  on  one 
observation  may  cause  multiple  errors  to  be  produced  on  related  observations.    For  example,  in  this 
application,  if  an  instrument  calibration  run  is  bad,  all  samples  associated  with  this  calibration  run  are 
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flagged  as  in  error. 

The  structure  of  the  error  record  is  shown  below. 

Key  Fields:  SDG  number 

Case  number 

Sample  number 

Date  and  time  of  edit 

Additional  key  field  (if  necessary) 
Error  Error  Code 

Information:  Related  observation  and  field  name 

(from  variable  comparisons) 
Error  Current  value  of  discrepant  field 

Correction     Correct  value  of  discrepant  field 

SAS  format  tables  are  used  to  store  the  text  for  the  error  code  which  explains  the  nature  of  the 
error.    The  following  is  an  example  of  the  text  storage  for  several  error  codes. 

Enor  Code  Text  Message 

980  Injection  date 

990  Instrument  identifier 

1000  Column 

1010  Percent  decanted  inconect  for  standard 

1020  Preliminary  sequence  incorrect 

1030  Excessive  standards  between  standards 

1040  Date  sample  received  outside  limits 

1050  Sequence  not  in  chronological  order 

1060  12  hour  standard  exceeds  limits 

The  table  consists  of  a  unique  enor  code  or  error  number  followed  by  a  text  string  which  when 
combined  with  the  other  error  information  on  the  error  record  tells  the  nature  of  the  error  and  its 
location  on  the  files.    These  text  messages  are  decoded  from  the  error  code  for  reporting  purposes. 

To  produce  usable  error  reports,  the  error  records  for  a  particular  set  of  data  are  selected,  analyzed, 
and  summarized.    As  shown  on  the  sample  error  report  (see  Appendix  E)  the  report  is  summarized 
by  sample  and  form  number.    For  each  error,  enough  information  is  presented  to  the  user  to  locate 
the  field  in  error  on  the  hardcopy  forms.    Note  that  redundant  errors  are  summarized  and  not 
displayed.    Additional  fields  are  stored  on  the  error  record  but  are  not  needed  on  this  report.    These 
additional  fields  will  be  used  later  to  locate  and  update  the  discrepant  variables. 

Error  Identification 

Once  error  specifications  are  designed  and  implemented,  data  can  be  evaluated  accordingly.    Each 
machine  readable  record  and  data  element  is  passed  against  the  pre-programmed  specifications  using 
SAS.    Response  fields  are  checked  as  to  allowable  values,  consistency  and  substantial  relationships 
between  data  elements.    Enor  report  listings  are  generated  for  data  that  do  not  conform  to  cleaning 
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specifications.    Errors  must  be  identified  unambiguously  according  to  location  in  tiie  original  hardcopy 
document  or  package.    Such  identification  requires  the  use  of  multiple  keys.    The  original  hardcopy 
documents,  existing  policy  decision  and  queries  to  die  generator  of  this  source  data  maybe  used  to 
determine  die  nature  of  tiie  error  and  the  appropriate  metiiod  of  resolution. 

Major  activities  performed  under  the  error  identification  system  include: 

•  Reformatting  and  restructuring  of  data 

•  Retrieval  and  comparison  of  tolerance  specifications  witii  the  data, 

•  Writing  data  checks,  and 

•  Unambiguously  identifying  errors. 

Primary  SAS  features  used  include  format  tables,  multiple  file  keys,  functions,  macros,  conditional 
output,  temporary  work  files,  missing  data  features  and  automatic  character/numerical  conversion. 

Reformatting  and  Restructuring  the  Data 

Before  data  can  be  checked  against  tolerance  specifications  it  must  first  be  organized  in  a  form 
allowing  related  fields  to  be  compared.    Witii  certain  types  of  data  tiiis  is  a  complex  problem. 
Analytical  results  data  are  reduced  from  80  raw  data  record  types  to  16  partitioned  files  (SAS  files). 
These  16  files  must  be  restructured  and  merged  for  tiie  various  edits. 

The  major  SAS  features  used  to  restructure  tiie  data  are: 

•  Conditional  output  to  temporary  work  files 

•  File  sorting  and  merging 

In  tiie  current  application,  data  are  extracted  from  tiie  permanent  files  by  SDG  (sample  delivery 
group).    A  batch  job  is  submitted  to  edit  one  SDG  which  reads  tiirough  tiie  permanent  files  and 
extracts  all  data  for  this  one  SDG  using  conditional  outputs,  the  extracted  records  are  written  to 
temporary  work  files.    All  data  necessary  to  execute  the  edits  are  then  available  to  the  program.    The 
work  files  are  then  structured  (through  sorting  and  merging)  as  needed  for  each  edit 

Retrieval  and  Comparison  of  Specifications  with  the  Data 

Hundreds  of  variables  are  edited,  each  of  which  may  contain  multiple  discrete  values  or  ranges  of 
values.    The  retrieval  of  the  data  specifications  for  each  variable  must  be  integrated  with  the  edit 
process  to  compare  these  specifications  with  the  data  fields.    Through  the  use  of  SAS  format  tables 
to  store  data  specifications,  tiie  retrieval  of  these  specifications  can  be  done  as  needed  tiiroughoul  the 
edit  process. 
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The  SAS  features  used  to  retrieve  and  compare  specifications  with  the  data  are: 

•  SAS  format  tables  - 

SAS  format  tables  allow  quick  revival  of  data  specifications. 

•  SAS  functions  - 

The  SAS  'SUBSTRING'  and  'INDEX'  functions  are  useful  in  manipulating  data  specification 
strings.    The  'PUT'  function  allows  retrieval  of  data  specifications  from  format  tables. 

•  SAS  character/numeric  conversion  handling  - 

SAS  will  automatically  convert  a  character  string  to  a  numeric  field  (and  vice  versa)  if  it  is 
necessary  for  a  data  comparison. 

In  this  application  table  specifications  are  retrieved  from  the  format  tables  using  both  the  SAS  'PUT' 
and  'SUBSTR'  functions.  The  following  sample  of  SAS  code  shows  how  the  low  and  high  limits  for 
a  particular  variable  for  one  compound  are  extracted  from  the  format  file. 

CRQL  =  PUT(CAS_NO.$CRQL.); 
LCRQL  =  SUBSTR(CRQL.2,5); 
HCRQL  =  SUBSTR(CRQL,8,5); 

The  "PUT"  function  lakes  the  compound  code  stored  in  the  variable  "CAS-NO"  and  retrieves  the 
values  for  that  compound  from  the  SAS  Format  table  named  "SCRQL".    The  variable  CRQL 
contains  the  entire  set  of  limits  for  that  chemical  compound.    CRQL  is  divided  into  the  low  and 
high  limits  through  the  use  of  the  SAS  'SUBSTR'  function.    Variables  LCRQL  and  HCRQL  now 
contain  the  low  and  high  limits  for  a  given  variable  and  can  be  used  for  comparisons. 

Another  feature  of  SAS  which  is  very  helpful  in  coding  edits  and  data  comparisons  is  the  automatic 
numeric/character  and  character/numeric  conversion.    Fields  will  be  automatically  converted  by  SAS 
if  needed.    This  feature  is  helpful  when  retrieving  values  from  format  tables  because  all  fields  stored 
in  format  tables  are  character. 

Writing  Data  Checks 

Writing  the  code  to  check  the  data  against  other  data  elements  and  against  specifications  is  probably 
the  most  complex  task  in  the  entire  editing  process.    The  logic  involved  and  amount  of  code  required 
to  perform  the  edits  is  substantial.    The  SAS  features  used  to  write  data  checks  include: 

•  SAS  functions  - 

SAS  maintains  an  extensive  library  of  character  siring  manipulation,  numeric  and  date/time 
handling  functions.    The  use  of  built-in  SAS  functions  can  greatly  simplify  the  storage,  reirieval, 
and  comparison  of  tabled  specifications. 

•  Processing  control  features  - 

The  SAS  MACRO  facility,  %INCLUDE,  LINK,  and  conditional  output  statements  give  added 
control  over  program  processing. 
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•    SAS  missing  data  feature  - 

Empty  or  missing  values  can  be  recognized  by  SAS  by  using  special  program  statements  which 
help  simplify  the  handling  of  missing  data. 

Many  SAS  features  were  used  to  write  data  checks  for  this  application.    The  large  number  of  data 
fields  to  be  checked  and  the  varied  nature  of  the  checks  and  data  requires  various  techniques  to 
optimize  and  make  the  processing  more  efficient. 

Functions  greatly  facilitate  date/lime  handling  which  can  be  difficult  and  time  consuming.    SAS  date 
and  time  functions  can  read,  store,  and  format  date  and  time  variables  in  a  number  of  formats. 

SAS  string  manipulation  functions  like  the  'INDEX'  and  'INDEXC  functions  are  used  to  search 
strings  for  specific  characters  or  character  strings.    SAS  'LEFT'  and  'RIGHT'  functions  are  used  to 
justify  data  within  fields  before  comparisons.    The  'COMPRESS'  function  is  used  to  remove  certain 
characters  from  fields  and  the  'TRIM'  function  is  used  to  remove  blank  characters. 

SAS  MACROS,  %INCLUDE  statement  and  the  LINK  statement  are  used  mainly  for  the  execution  of 
redundant  data  checks.    Writing  one  piece  of  code  and  calling  it  repeatedly  (passing  different 
parameters  as  needed)  greatly  reduces  the  amount  of  coding.    GO- TO  statements,  the 
lE-THEN-ELSE  statement,  and  various  types  of  DO-LOOPS  are  used  to  reduce  run  lime  and  make 
the  programs  run  more  efficiently.    These  statements  are  used  to  bypass  small  sections  of  code  which 
need  conditional  execution. 

Larger  sections  of  code  can  be  skipped  by  the  use  of  macros  and  global  variables.    In  this 
application,  for  example,  there  are  three  general  groupings  of  chemical  compounds  called  fractions. 
Certain  edits  are  done  on  all  three  fractions,  and  some  are  done  on  one  specific  fraction  only.    Data 
packages  received  may  contain  only  one  or  two  of  the  three  possible  fractions.    Code  bypasses  are 
used  to  save  processing  and  avoid  the  program  executing  code  on  edits  for  which  there  are  no  data. 
By  checking  the  data  to  see  which  fractions  are  present  and  storing  this  fiag  in  a  global  variable, 
decisions  to  skip  entire  edits  can  be  made.    The  following  example  of  SAS  code  shows  how  this  is 
done. 

DATA  CHECK; 

SET  FILE9(KEEP=ONEVAR)  END=EDF; 

IF  EOF  THEN  DO: 

IF  ONEVAR  NE  '  '  THEN  DO; 

CALL  SYMPUT('GLOBVAR','YES'); 

END; 
END;  RUN; 

The  above  example  of  code  reads  a  file  to  see  if  any  data  are  present,  if  there  are  data  present  then 
'YES'  is  assigned  to  the  global  SAS  variable  'GLOBVAR'.    Later,  the  editing  program  executes  the 
following  macro  to  process  the  desired  section  of  code. 

%MACRO  FILE9; 

IF  &GLOBVAR  =  YES  %THEN  %D0; 
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%INCLUDE  LIB(FILE9); 
END; 

This  macro  checks  the  global  variable  'GLOBVAR'  and  if  it  is  set  to  'YES'  (which  means  there  is 
data  present  to  be  edited)  brings  in  the  code  necessary  to  edit  this  data  (in  this  example,  file  9  data) 
and  executes  iL    Since  the  code  for  editing  file  9  is  over  500  lines,  skipping  over  this  code  when 
there  is  no  file  9  data  can  save  considerable  processing  time. 

Processing  missing  data  can  be  difficult  unless  the  software  being  used  has  features  to  deal  with 
missing  variables.    SAS  has  built-in  mechanisms  to  process  missing  and  blank  data  fields.    These 
mechanisms  free  the  programmer  from  writing  the  code  to  handle  missing  data  problems.    Most  SAS 
functions  and  procedures  automatically  handle  missing  data  fields. 

Identifying  the  Nature  and  Location  of  the  Error 

Identifying  the  exact  nature  of  an  enor  and  determining  in  which  file  the  observation  containing  the 
error  can  be  found  is  a  critical  issue.    Reliable  error  reports  and  error  resolution  can  be  performed 
only  if  error  identification  is  unambiguous.    The  major  SAS  features  used  in  identifying  the  nature 
and  location  of  an  error  include: 

•  Data  manipulation  features 

•  SAS  format  tables 


The  nature  of  an  error  is  determined  as  precisely  as  possible  during  the  edit  using  field  comparisons 
and  double-checking  of  values  to  related  fields.    .As  much  information  as  possible  is  gathered  about 
each  error.    This  information  is  stored  on  a  permanent  error  file  for  subsequent  reporting  and  data 
correction  purposes. 

In  the  current  application,  a  40  character  text  field  containing  a  description  for  each  2  character  error 
code  is  stored  in  a  format  table.    The  error  code  written  to  the  error  record  is  'decoded'  or 
'formatted'  to  this  40  character  text  field  in  the  error  repoa 

Error  Resolution 

Records  or  data  elements  generating  error  messages  are  examined  during  the  error  resolution  process. 
When  the  discrepancy  that  triggered  the  error  message  is  resolved,  the  record  or  data  element  is 
updated  and  the  resolution  re-checked.    Eventually,  the  error  message  is  resolved  by  correcting  or 
deleting  the  data  offending  the  specification,  overriding  or  ignoring  the  specification  for  that 
particular  case  or  sample,  or  flagging  the  data  value  as  suspicious  and  overriding  the  specification. 
Copies  of  all  file  updates  and  data  fiags,  the  final  version  of  the  specifications  and  the  originally 
received  data  are  kept  for  documentation.    An  evaluation  of  these  ancillary  files  can  produce  an  audit 
of  the  quality  assurance  and  control  process. 

Major  activities  involved  in  error  resolution  include: 
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•  Idenlification  and  resolution  of  errors. 

•  Correction  of  the  error  records. 

•  Application  of  corrections  to  the  data. 

Primary  SAS  features  used  in  this  activity  are  format  tables,  conditional  output,  temporary  work  files, 
SAS  report  feature,  update  procedure  and  full  screen  file  editing. 

Identification  and  Resolution  of  Key  Errors 

Through  the  use  of  properly  designed  error  reports,  errors  can  be  identified  and  resolved.    In  certain 
types  of  edits,  a  single  error  can  generate  multiple  error  records.    It  is  important  that  the  error 
reports  unambiguously  identify  the  underlying  or  primary  error  so  that  a  single  correction  will  correct 
the  associated  errors.    Sometimes  this  will  be  impossible.    A  single  enor  source  may  not  be 
identified,  in  which  case  the  errors  are  corrected  and  the  edits  rerun.    Several  iterations  of  the 
edit-correction  process  may  be  necessary  to  correct  all  errors. 

The  SAS  features  used  to  identify  and  resolve  errors  are: 

•  SAS  report  writing 

•  Data  manipulation  features 

In  this  application,  error  reports  are  generated  using  the  SAS  report  writing  procedure  (see  example 
error  report  in  Appendix  E).    The  SAS  print  procedure  can  completely  format  a  detailed  report  with 
a  minimum  of  coding.    Various  SAS  data  manipulation  features  are  used  to  extract  and  'prepare'  the 
data  before  they  are  passed  to  the  print  procedure.    These  error  reports  are  used  to  locate  the 
specific  error  in  the  hardcopy  data  forms  where  a  decision  can  be  made  about  its  correction  and 
resolution. 

Correction  of  the  Error  Records 

Correction  of  the  error  records  requires  user  input  through  on-line  and  batch  processing.    Specific 
error  records  must  be  located  and  presented  to  the  user  for  updates.    The  user-input  corrections  are 
stored  on  the  error  records  until  the  data  files  are  updated. 

The  major  SAS  feature  used  in  making  corrections  to  the  error  records  is  SAS  full  screen  file 
editing. 

In  the  current  application,  errors  are  being  resolved,  but  the  corrected  values  are  not  being  entered 
on  the  files.    Due  to  the  large  volume  of  data  being  received,  the  manpower  necessary  to  execute 
this  error  correction  process  is  not  yet  available.    When  the  correction  process  is  implemented,  error 
reports  will  be  displayed  to  the  user  through  the  use  of  SAS  Full  Screen  editing.    This  SAS  facility 
allows  quick  access  to  SAS  files  through  full-screen  terminals.    One  advantage  of  using  this  facility  is 
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Ihal  SAS  will  aulomaiically  build  the  file  display  screen  for  the  user.    Updates  made  to  this  file 
through  full  screen  editing  are  input  directly  to  the  permanent  error  file  for  storage. 

Application  of  the  Corrections  to  the  Data 

Applying  corrections  to  the  data  involves  finding  the  file  and  observation  on  which  the  error  exists 
and  replacing  the  incorrect  field  with  the  user  input  correction.    This  is  done  by  building  into  the 
error  record  the  necessary  keys  to  locate  the  observation  which  generated  the  error.    The  SAS 
features  used  to  apply  the  corrections  to  the  data  include: 

•  SAS  file  sorting  and  merging 

•  SAS  update  procedure 

.As  stated  above,  corrections  are  currently  not  being  applied  to  the  data  files.    When  this  process  is 
implemented,  the  corrected  fields  on  the  error  records  will  be  applied  to  the  data  with  the  use  of  the 
SAS  'UPDATE'  procedure.    This  procedure  will  automatically  update  the  value  of  any  number  of 
variables  on  one  file  with  the  values  of  corresponding  variables  on  another  file.    The  error  records 
must  first  be  'prepared'  for  the  update  using  some  data  manipulation,  sorts,  and  merges. 


Conclusions 

Given  the  nature  of  the  data  and  various  operational  requirements,  SAS  is  an  effective  system 
language  for  designing  a  data  quality  assurance  process.    In  the  area  of  scienufic  and  technical  data 
collection  and  processing,  a  greater  variety  and  scope  of  data  occurs  than  in  business  applications. 
Data  is  often  collected  from  multiple  sources,  on  different  forms  or  represented  in  variable  or 
relational  structures.    Each  new  group  of  data  may  have  different  data  problems  and  require  the 
modification  or  design  of  new  error  detection  routines  or  reports.    Such  systems  tend  to  be  dynamic 
and  changes  in  data  and  data  formats  are  frequent 

At  present  over  T500  data  packages  have  been  processed  using  the  described  approach.    The  system 
appears  to  be  meeting  all  user  requirements.    In  addition  to  the  QA/QC  processing  discussed  in  this 
paper,  the  system  supports  data  aquisition.  electronic  data  transfer,  file  management  and  data  archival, 
all  of  which  are  performed  by  non-programming  personnel.    The  system  also  has  links  to  other  data 
base  management  systems  including  Natural/ ADABAS  and  FOCUS  and  has  been  linked  into 
previously  existing  financial  system  written  in  SAS. 

The  QA/QC  support  s\sicni  has  been  operating  in  production  mode  since  late  1987.    The  final 
phase,  daLT  corrcciion,  will  eo  iiiio  production  in  late  1988.    All  modifications  to  this  system, 
including  the  rele.isc  of  oiih.inccnients.  have  been  made  without  interruption  of  production  processing. 
One  by  prodiici  of  ilu-  ■.\si>.-iii  h,i>  been  the  .i\.nlability  of  the  data  to  various  users.    A  large 
qu.uiiUN  of  \.\\\  .m.iUiKMl  u'mi1i>  d.it.i  h.is  now  been  accumulated.    Smce  the  data  are  maintamed  in 
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SAS  files,  statistical  analyses  by  users  is  greatly  facilitated. 

SAS  has  proven  to  be  an  appropriate  a  language  for  supporting  generalized  error  identification  and 
resolution  tasks.    Once  a  quality  assurance  and  control  system  is  structured,  knowledgeable  users  can 
design  and  review  data  checks.    Statements  may  be  added,  modified,  or  deleted  easily  and  quickly. 
The  same  general  system  features  and  structure  can  be  extended  to  other  data  and  data  acquisition 
efforts.    If  the  system  becomes  more  static  and  routine,  patching  in  a  procedural  language  such  as 
PL/1  or  C  can  easily  be  done  to  optimize  code  and  increase  operational  efficiency.    The  techniques 
and  approach  discussed  will  be  used  for  future  applications  which  require  extensive  data  quality 
control  and  editing  procedures  prior  to  storage  and  utilization.n 


Winter    I  i 


iassist   quarterly 


43 


Appendix   A   :   Five  SAS  tables  stored   in   one  text  file 


TABLE  VALUE - 


COMMENT 


TABLE  #1:  ERRORS 


TABLE  ERROR  CODE 

01  10 

01  20 

01  30 

01  31 

01  40 

01  50 


ERROR  TEXT 

ANALYSIS  DATE 

ANALYSIS  DATE-BLANK 

ANALYSIS  DATE-BLANK  FOR  CLOUMN  2 

ANALYSIS  DATE-  FOR  COLUMN  2 

ANALYSIS  DATE-  HEADER 

ANALYSIS  DATE-CONT.  CALIBRATION 


TABLE 

#2:  : 

TABLE 

SURR 

02 

SI 

02 

S2 

02 

S3 

02 

S4 

02 

S5 

02 

S6 

$SURRQC  -  SURROGATE  RECOVERY  ADVISORY  QC  LIMITS 

.  CODE   -  2A  -]  2B  ]-  2C  -]  2D  ]-  2E-]-  2F  - 
08  8110081117035114  02  312  0024154020150 
08  611507  412104  3116030115 
076114  07  0121033141018137 

010094024113 

021100025121 

010123019122 


TABLE  #3:  $HSMSDQC  -  MATRIX  SPIKE/MATRIX  SPIKE  RECOVERY  QC  LIMITS 


TABLE 

CAS  NO 

03 

75354 

03 

79016 

03 

71432 

03 

108883 

03 

108907 

RPD  REC-REC  RPD  REC-REC  ORD 
014  061  145  022  059  172  01 
014  071  120  024  062  137  02 
Oil  076  127  021  066  142  03 
013  076  125  021  059  139  04 
013  075  130  021  060  133   05 


TABLE  #4:  $CMPDNAME  -  COMPOUND  NAMES 


VOLATILE  COMPOUNDS 


TABLE 

CAS  NO 

04 

74873 

04 

74839 

04 

75014 

04 

75003 

04 

75092 

COMPOUND  NAME 

CHLOROMETHANE 
BROMOMETHANE 
VINYL  CHLORIDE 
CHLOROETfiANE 
METHYLENE  CHLORIDE 


TABLE  #5:  $CCCSPCC  -  THIS  IS  FOR  FORM  6  MIN  RRF,  MAX  %  RSD 
AND  FOR  FORM  7  MIN  RRF50,  MAX  %  D 


TABLE 

CAS  NO 

07 

74873 

07 

75014 

07 

75354 

07 

75343 

07 

67663 

07 

78875 

07 

75252 

07 

79345 

RRFLIM   RSDLIM   RRF50LIM 


30.0 
30.0 


0.250 
0.  30 


30.0 
30.0 


25. 

.0 

25. 

.0 

0. 

,30 

25. 

.0 

25. 

.0 

0. 

.250 

0. 

.30 
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Appendix   B   :    SAS   program   to   load   SAS   format   tables 


DATA  TAB(KEEP=TADLE  VALUE  LABEL) ; 

INFILE  TABLES; 

LENGTH  TABLE  2.  VALUE  $10.  LABEL  $40.; 

KEEP  TABLE  VALUE  LABEL; 

INPUT  02  TABID  $CHAR2 .  67  VALUE  SCHARIO.  €19  LABEL  $CHAR40. 

IF  INDEXC(TAB_ID, '123466789' )  GT  0  THEN  DO; 

TABLE  =  TAB_ID; 

OUTPUT; 
END; 

PROC  SORT  DATA=TAB; 
BY  TABLE; 

DATA  _NULL_; 

SET  TAB  END=EOF; 

BY  TABLE; 

FILE  TEMPI; 

IF  FIRST. TABLE  AND  TABLE  EQ  1  THEN  DO; 

PUT  ei  'PROC  FORMAT  DDNAME=FORMAT; ' ; 

PUT  §1  'VALUE  ERRORS'; 
END; 
IF  FIRST. TABLE  AND  TABLE  EQ  2  THEN  DO; 

PUT  @1  'PROC  FORMAT  DDNAME= FORMAT; ' ; 

PUT  @1  'VALUE  $SURRQC'; 
END; 
IF  FIRST. TABLE  AND  TABLE  EQ  3  THEN  DO; 

PUT  ei  'PROC  FORMAT  DDNAME=FORMAT ; ' ; 

PUT  ei  'VALUE  $MSMSDQC'; 
END; 
IF  FIRST. TABLE  AND  TABLE  EQ  4  THEN  DO; 

PUT  ei  'PROC  FORMAT  DDNAME= FORMAT; ' ; 

PUT  ei  'VALUE  $CMPDNAME'; 
END; 
IF  FIRST. TABLE  AND  TABLE  EQ  5  THEN  DO; 

PUT  ei  'PROC  FORMAT  DDNAME=FORMAT; ' ; 

PUT  01  'VALUE  $CCCSPCC'; 
END; 
IF  FIRST. TABLE  AND  TABLE  EQ  6  THEN  DO; 

PUT  01  'PROC  FORMAT  DDNAME=FORMAT; ' ; 

PUT  01  'VALUE  $CRQL'; 
END; 
IF  FIRST. TABLE  AND  TABLE  EQ  7  THEN  DO; 

PUT  01  'PROC  FORMAT  DDNAME=FORMAT; ' ; 

PUT  01  'VALUE  $CMPDTAB'; 
END; 

PUT  03  VALUE  $CHAR10.  015  '="'  017  LABEL  $CHAR40.  ""  ; 
IF  LAST. TABLE  THEN  PUT  ';'; 
IF  EOF  THEN  PUT  ' ; ' ; 
RUN; 

OPTIONS  DQUOTE; 
IINCLUDE  TEMPI; 
RUN; 
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Appendix   C   :    Data   collection   forms   -    form   la 


EPA  SAMPLE  NO. 


VOLATILE  ORGANICS  ANALYSIS  DATA  SHEET 


Lab  Name: 
Lab  Code: 


Case  No. : 


Contract: 
SAS  No. : 


SDG  No. 


Matrix:  (soil/water) 
Sample  wt/vol: 
Level:    (low/med) 
%  Moisture:  not  dec. 
Column:   (pack/cap) 

CAS  NO. 


_(g/mL)_ 


Lab  Sample  ID: 

Lab  File  ID: 

Date  Received: 

Date  Analyzed: 

Dilution  Factor: 

CONCENTRATION  UNITS: 
(ug/L  or  ug/Kg) 


74-87-3 Chloromethane                1               1      1 

74-83-9 Bromomethane                1              1      1 

75-01-4 Vinvl  Chloride               1               1      1 

75-00-3 Chloroethane                 1               1      1 

75-09-2 Methylene  Chloride           |              1      1 

67-64-1 Acetone                     1              I      1 

75-15-0 Carbon  Disulfide             |              1      1 

75-35-4 1, 1-Dichloroethene           i               1      1 

75-34-3 1,1-Dichloroethane           |               1      1 

540-59-0 1. 2-Dichloroethene  (total)   |              1      1 

67-66-3 Chloroform                   1               1      1 

107-06-2 1,2-Dichloroethane           |              1      1 

78-93-3 2-Butanone                   1               1      1 

71-55-6 1,1,1-Trichloroethane        |              1      1 

56-23-5 Carbon  Tetrachloride         |              1      1 

108-05-4 Vinyl  Acetate               1              1      1 

75-27-4 Bromodichloromethane        |             1      1 

78-87-5 1,2-Dichloropropane          1              1      1 

10061-01-5 cis-1.3-Dichloropropene      |              1      1 

79-01-6 Trichloroethene             |              1      1 

124-48-1 Dibromochloromethane         |              1      1 

79-00-5 1, 1,2-Trichloroethane        |              1      1 

71-43-2 Benzene                     1              1      1 

10061-02-6 trans-l,3-Dichloropropene    |              1      1 

75-25-2 Bromoforin                    1               1      1 

108-10-1 4-Methyl-2-Pentanone         |               1      1 

591-78-6 2-Hexanone                   1               1      1 

127-18-4 Tetrachloroethene            |              1      1 

79-34-5 1,  1, 2 .2-Tetrachloroethane    |               1      1 

108-88-3 Toluene                      1               1      1 

108-90-7 Chlorobenzene               1              I.I 

100-41-4 Ethylbenzene                1              1      1 

100-42-5 Styrene                      1               1      1 

1330-20-7 Xylene  (total)               1              1      1 

FORM  I  VOA 
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Appendix   C   :   Data   collection   forms   -    form    lb 


EPA  SAMPLE  NO. 


Lab  Name: 
Lab  Code: 


SEMIVOLATILE  ORGANICS  ANALYSIS  DATA  SHEET 

Contract: 

Case  No. :  SAS  No. : 


SDG  No. 


Matrix:  (soil/water) 
Sample  wt/vol: 
Level:    (low/med) 
%  Moisture:  not  dec. 


Jg/mL). 


dec. 


Extraction:   (SepF/Cont/Sonc) 
GPC  Cleanup:    (Y/N) pH: 


CAS  NO. 


COMPOUND 


Lab  Sample  ID: 

Lab  File  ID: 

Date  Received: 

Date  Extracted :_ 

Date  Analyzed: 

Dilution  Factor: 

CONCENTRATION  UNITS: 
(ug/L  or  ug/Kg) 


108-95-2 Phenol                     i             i      i 

111-44-4 bis  (2-Chloroethyl)  ether     |             |      1 

95-57-8 2-Chlorophenol               |              |      1 

541-73-1 1, 3-Dichlorobenzene          |              |      | 

106-46-7 1,4-Dichlorobenzene          |               |      | 

100-51-6 Benzyl  alcohol               |               |      | 

95-50-1 1,2-Dichlorobenzene          |               |      | 

95-48-7 2-Methylphenol               |               |      | 

108-60-1 bis(2-Chloroisopropyl) ether  |              |      | 

106-44-5 4-Methylphenol               |               |      1 

621-64-7 N-Nitroso-di-n-propylamine   |              |      | 

67-72-1 Hexachloroethane             |              1      1 

98-95-3 Nitrobenzene                |              |      | 

78-59-1 Isophorone                  |              |      | 

88-75-5 2-Nitrophenol               |             |      | 

105-67-9 2,4-Dimethylphenol           |               |      | 

65-85-0 Benzoic  acid                |             |      | 

111-91-1 bis  (2-Chloroethoxy)  methane   |             |      | 

120-83-2 2,4-Dichlorophenol           |              1      | 

120-82-1 1, 2,4-Trichlorobenzene       |              |      | 

91-20-3 Naphthalene                1             1      | 

106-47-8 4-Chloroaniline             |             |      1 

87-68-3 Hexachlorobutadiene          |              |      | 

59-50-7 4-Chloro-3-methylphenol      |              |      | 

91-57-6 2-MethYlnaphthalene          |              |      | 

77-47-4 Hexachlorocyclopentadiene    |              1      | 

88-06-2 2,4,6-Trichlorophenol        |              |      | 

95-95-4 2,4  ,5-Trichlorophenol        |               1      | 

91-58-7 2-Chloronaphthalene          |              |      | 

88-74-4 2-Nitroaniline              |              |      | 

131-11-3 Dimethylphthalate            |               |      y 

208-96-8 Acenaphthylene              |              |      | 

606-20-2 2,6-Dinitrotoluene           |               |      | 

FORM  I  SV-1 
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Appendix   t   :    Data   collection   forms   -    form    Ic 


EPA  SAMFLL  IJu. 


Lab  Name: 
Lab  Code: 


SEMIVOLATILE  ORGANICS  ANALYSIS  DATA  SHEET 

Contract: 

Case  No. :  SAS  No. : 


Matrix:  (soil/water) 
Sample  wt/vol : 
Level:    (low/med) 
%  Moisture:  not  dec. 


_(g/mL)_ 


dec. 


Extraction:   (SepF/Cont/Sonc) 
GPC  Cleanup:    (Y/N) pH: 


Lab  Sample  ID: 
Lab  File  ID: 
Date  Received: 
Date  Extracted : 
Date  Analyzed: 


CAS  NO. 


COMPOUND 


Dilution  Factor: 


CONCENTRATION  UNITS; 
(ug/L  or  ug/Kg) 


99-09-2 3-Nitroaniline               j               i       i 

83-32-9 Acenaphthene                 |               |      | 

51-28-5 2,4-Dinitrophenol            |               |      1 

100-02-7 4-Nitrophenol               |              |      | 

132-64-9 Dibenzofuran                 |               |       | 

121-14-2 2,  4-Dinitrotoluene            |                |       | 

84-66-2 Diethylphthalate             |               |       | 

7005-72-3 4-Chlorophenyl-phenylether   |              |      | 

86-73-7 Fluorene                     |               |       | 

100-01-6 4-Nitroaniline               |               |       | 

534-52-1 4, 6-Dinitro-2-methylphenol   |               |       | 

86-30-6 N-Nitrosodiphenylamine  (1)   |              |      | 

101'-55-3 4-Bromophenyl-phenylether    |              |      | 

118-74-1 Hexachlorobenzene           |              |      | 

87-86-5 Pentachlorophenol            |              |      | 

85-01-8 Phenanthrene                |              |      | 

120-12-7 Anthracene                  |              |      | 

84-74-2 Di-n-butylphthalate          |              |      | 

206-44-0 Fluoranthene                 |               |      | 

129-00-0 Pyrene                       |               |      | 

85-68-7 Butylbenzylphthalate         |               |      | 

91-94-1 3, 3 '-Dichlorobenzidine       |               |      | 

56-55-3 Benzo  (a)  anthracene           |               |      | 

218-01-9 Chrysene                     |               |      | 

117-81-7 bis(2-Ethylhexyl)phthalate   1              |      | 

117-84-0 Di-n-octylphthalate          |              |      | 

205-99-2 Ben2o(b)  fluoranthene         |               |      | 

207-08-9 Benzo(k)  fluoranthene         |              |      | 

50-32-8 Benzo  (a)  pyrene              |              |      | 

193-39-5 Indeno(l,2,3-cd)pyrene       |               |      1 

53-70-3 Dibenz  (a, h)  anthracene        |              |      | 

191-24-2 Benzo(g,h,  i)perylene         |              |      | 

(1)  -  Cannot  be  separated  from  Diphenylamine 

FORM  I  SV-2 
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Appendix   C   :    Data   collection   forms   -    form    Id 


Lab  Name: 
Lab  Code: 


ID 
PESTICIDE  ORGANICS  ANALYSIS  DATA  SHEET 

Contract: 

Case  No. :         SAS  No. : 


EPA  SAMl 


SDG  No. 


Matrix:  (_soil/water) 
Sample  wt/vol : 
Level:    (low/med) 
%  Moisture:  not  dec. 


_(g/mL)_ 


dec. 


Extraction:   (SepF/Cont/Sonc) 
GPC  Cleanup:    (Y/N) pH: 


CAS  NO. 


COMPOUND 


Lab  Sample  ID: 

Lab  File  ID: 

Date  Received: 

Date  Extracted :_ 

Date  Analyzed: 

Dilution  Factor: 

CONCENTRATION  UNITS: 
(ug/L  or  ug/Kg) 


1  319-84-6 

-alpha-BHC 

1  319-85-7 

-beta-BHC 

1  319-86-8 

-delta-BHC 

1  58-89-9 

-qamma-BHC  (Lindane) 

1  76-44-8 

-Heptachlor 

1  309-00-2 

-Aldrin 

1  1024-57-3 

-Heptachlor  epoxide 

1  959-98-8 

-Endosulfan  I 

1  60-57-1 

-Dieldrin 

1  72-55-9 

-4, 4 '-DDE 

1  72-20-8 

-Endrin 

1  33213-65-9 

1  72-54-8 

1  1031-07-8 

1  50-29-3 

1  72-43-5 

1  53494-70-5 

1  5103-71-9 

1  5103-74-2 

1  8001-35-2 

-Endosulfan  II 
-4,4 '-DDD 

-Endosulfan  sulfate 
-4,4 '-DDT 
-Methoxychlor 
-Endrin  ketone 
-alpha-Chlordane 
-gamma -Chlordane 
-Toxaphene 

1  12674-11-2 

-Aroclor-1016 

1  11104-28-2 

-Aroclor-1221 

1  11141-16-5 

-Aroclor-1232 

1  53469-21-9 

-Aroclor-1242 

1  12672-29-6 

-Aroclor-1248 

1  11097-69-1 

-Aroclor-1254 

1  11096-82-5 

-Aroclor-1260 

FORM    I    PEST 
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Appendix  C   :   Data   collection   forms   -    form    le 


EPA  SAMPLE  NO. 


Lab  Name: 
Lab  Code: 


VOLATILE  ORGANICS  ANALYSIS  DATA  SHEET 
TENTATIVELY  IDENTIFIED  COMPOUNDS 


Contract: 
SAS  No. : 


Matrix:  (soil/water) 
Sample  wt/vol : 
Level:    (low/med) 
%  Moisture:  not  dec. 
Column:   (pack/cap) 


_(g/mL)_ 


Lab  Sample  ID: 
Lab  File  ID: 
Date  Received: 
Date  Analyzed: 


Dilution  Factor: 


Number  TICs  found: 


CONCENTRATION  UNITS: 
(ug/L  or  ug/Kg) 


CAS  NUMBER 


9. 
10. 
11. 
12. 
13. 
14. 
15. 
16. 
17. 
18. 
19. 
20. 
21. 
22. 
23. 
24. 
25. 
26. 
27. 
28. 
29. 
30. 


COMPOUND  NAME 


EST.     CONC.     I       Q 

1 

FORM    I    VOA-TIC 
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\ppendi\   C   :    Data   collection   forms   -    form    If 


Lab  Name: 
Lab  Code : 


IF 

SEMIVOLATILE  ORGANICS  ANALYSIS  DATA  SHEET 
TENTATIVELY  IDENTIFIED  COMPOUNDS 


Contract: 
SAS  No. : 


EPA  SAMPLE  NO. 


Matrix:  (soil/water) 
Sample  wt/vol : 
Level:    (low/med) 
%  Moisture:  not  dec. 


_(g/mL), 


dec. 


Extraction:   (SepF/Cont/Sonc) 
GPC  Cleanup:    (Y/N) pH: 

Number  TICs  found: 


Lab  Sample  ID: 
Lab  File  ID: 
Date  Received: 
Date  Extracted :_ 
Date  Analyzed: 
Dilution  Factor: 


CONCENTRATION  UNITS: 
(ug/L  or  ug/Kg) 


CAS  NUMBER     |          COMPOUND  NAME 
1-               1 

2.              1 

3-              1 

4.              1 

5-              1 

6.               1 

7.               1 

8.               1 

9.              1 

10.              1 

11.              1 

12.              1 

13.              1 

14.              1 

15.              1 

16.              1 

17.              1 

18.              1 

19.              1 

20.              1 

21.              1 

22.              1 

23.              1 

24.              1 

25.              1 

26.              1 

27.              1 

28.              1 

29.              1 

30.              1 

1 

RT    1   EST.CONC.   |   Q   | 

FORM    I    SV-TIC 
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Appendix   D   :    Example  sas   files 


FILE  1 
HEADER  IMFORMATION 

#  VARIABLE  TYPE  LENGTH 
U  ADATEF1   NUM  8 

17  COUNITS  CHAR  5 

15  COLUMN  1   CHAR  A 

16  DILUTION  NUM  8 
24  DRD      NUM  8 

19  EXTDATE1  NUM  8 

20  EXTRACT  1  CHAR  A 

10  FILE_ID1  CHAR  U 

1  FORM     CHAR  3 

21  GPC      CHAR  1 

11  LEVEL     CHAR  3 

6  MATRIX  CHAR  5 
23  NOTICS  CHAR  2 
13  P_HOIST   NUM  8 

18  P_HOISTD  NUM  8 

22  PH_      NUM  8 

12  RECDATE   NUM  8 

7  SAHP_ID1  CHAR  12 

8  SAMP_UT  NUM  8 
3  SAMPLE  CHAR  12 
A  SASNO  CHAR  6 
5  SDG_NO   CHAR  5 

2  SUFFIX    CHAR  2 

9  UT_UNITS  CHAR  2 

FILE  2 
RESULTS  FILE 

#  VARIABLE  TYPE    LENGTH 

5  CAS_NO  CHAR  10 
8  DRD      NUM  8 

1  FORM  CHAR  3 
7  QUAL      CHAR  5 

6  RESULT1   NUM  8 

3  SAMPLE  CHAR  12 
A  SDGNO   CHAR  5 

2  SUFFIX    CHAR  2 


FILE  3 
TIC  RESULTS 

#  VARIABLE  TYPE    LENGTH 

5  CASNO  CHAR  10 
9  COMPOUND  CHAR  28 

11  DRD  NUM  8 

1  FORM  CHAR  3 

6  QUAL  CHAR  5 

7  RESULT2  NUM  8 
10  RT  NUM  8 

3  SAMPLE  CHAR  12 

A  SDGNO  CHAR  5 

8  SEQUENCE  CHAR  2 

2  SUFFIX  CHAR  2 


FILE  A 
SDG  INFORMATION 
#  VARIABLE  TYPE    LENGTH 

5  CASENO   CHAR  5 

2  CONTRACT  CHAR  11 

6  DRD       NUM  8 

3  LAB_COOE  CHAR  6 
1  LABNAME  CHAR  25 

7  LOADDATE  NUM  8 
A  SDG_NO   CHAR  5 

FILE  5 
TUNING  INFORMATION 
tt   VARIABLE  TYPE    LENGTH 

A6  ADATEF5   NUM  8 

A7  ATIHEF5  NUM  8 

7  COLUMNS  CHAR  A 
A9  DRD  NUM  8 
A5  FILEIDS  CHAR  1A 

1  FORM      CHAR  3 

11  INJDFS  CHAR  10 
10  INJDATE  NUM  8 

12  INJTIME  NUM  8 
6  LEVEL  CHAR  3 
5  MATRIX   CHAR  5 

32  HE  127    NUM  8 

17  HE173    NUM  8 

18  ME173H   NUM  8 

19  ME17A    NUM  8 

20  ME175    NUM  8 

21  ME175M   NUM  8 

22  ME  176    NUM  8 

23  ME176H  NUM  8 
2A  HE  177    NUM  8 

25  ME177M   NUH  8 

33  HE  197  NUM  8 
3A  HE  198    NUM  8 

35  HE199    NUM  8 

36  ME275    NUM  8 

37  HE365    NUM  8 

38  HEAA1    NUM  8 

39  HEAA2  NUM  8 
AO  HEAA3  NUM  8 
A1  MEAA3M   NUM  8 

13  ME50     NUM  8 

26  ME51     NUH  8 

27  ME68     NUH  8 

28  HE68M    NUH  8 

29  HE69     NUH  8 

30  HE70     NUH  8 

31  HE70H  NUH  8 
K  HE75     NUH  8 

15  HE95     NUM  8 

16  HE96     NUH  8 

8  PAGE  CHAR  1 
A8  PAGETOT  CHAR  1 
AA  SAHP_1D5  CHAR  12 
A3  SAMPLE   CHAR  12 

3  SASNO    CHAR  6 

A  SDGNO   CHAR  5 

A2  SEQUENCE  CHAR  2 

2  SUFFIX    CHAR  2 

9  TFILE  ID  CHAR  1A 


FILE  6 
CALIBRATION  INFORMATION 

I  AVG_RRF6  NUM  8 

26  BHRCTO   NUM  8 
2A  CASNO   CHAR  10 

8  C0LUMN6   CHAR  A 
25  DRD      NUM  8 

2  FORM     CHAR  3 

12  F6F_1D1   CHAR  1A 

13  F6F_1D2   CHAR  1A 
1A  F6F_ID3  CHAR  1A 

15  F6FJDA   CHAR  1A 

16  F6F  105   CHAR  1A 
10  ICDATE16  NUM  8 

II  ICDATE26  NUM  8 

9  INJ0F6  CHAR  10 
7  LEVEL  CHAR  3 
6  MATRIX    CHAR  5 

27  PKRCTO   NUM  8 

17  RRF1      NUM  8 

18  RRF2     NUH  8 

19  RRF3      NUH  8 

20  RRFA      NUH  8 

21  RRF5     NUH  8 
23  RSD      NUH  8 

A  SAS_NO    CHAR  6 

5  SDGNO   CHAR  5 

1  SEQUENCE  CHAR  2 

3  SUFFIX    CHAR  2 

28  VHRCTO   NUH  8 
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Appendix   E   :    Error   report 


OARQ        ERROR        REPORT 
F   0  R 
IA8:    LABI  SDG:    EU341  FORMAT: 'A' 


13:47  TUESDAY,  HAY  24,  19 


SAHPLE=A  FORM  =6A 


OBS   FORM    SEQUENCE  NUMBER 
SUFFIX   OR  COMPOUND  NAME 


ERROR 
NUMBER 


CURRENT  VALUE   CORRECT  VALUE   SECONDARY   COMPARISON 
IDENTIFIER   BETUEEN 
FORMS 


VINYL  CHLORIDE 

TRANS- 1,3-DICHLOROPROPENE 

CHLOROMETHANE 

CHLOROMETHANE 

CHLOROMETHANE 
VINYL  CHLORIDE 
2HEXAN0NE 

133027 
BIS(2-CHL0R0ETHYL)ETHER 
2 

133027 
CHLOROE THANE 

133027 


430 
903 


360 
410 


410 
430 


1070 
1220 


130 
650 
380 
140 
1201 
890 


INSTRUMENT  ID 

MS/MSO  MORE  F0RM3  NEEDED 

COLUMN 

NUMBER  OF  ADDITIONAL 

COMPOUND  MISSPELLED 

FILE  ID 

INITIAL  CALIBRATION  DATE 

NUMBER  OF  ADDITIONAL 

INITIAL  CALIBRATION  DATE 

INSTRUMENT  ID 

AVERAGE  RRF 

CAS  NUMBER 

RESULT 

SAMPLE  NO. 

CAS  NUMBER 

AVERAGE  RRF 

CAS  NUMBER 

CONCENTRATION  UNITS 

H/E176 

FORM  NUMBER 

COLUMN 

SAMPLE  ID-BLANK  HISSIN  G 

MS  PERCENT  OUT 


MISSING 

HISSING 

RECORDS  OF 

MISSING 

MISSING 

.307 

133 


UG/KG 
<>  LIMIT 


VGIK2 
FORM3777 

THIS  TYPE 

RRF20 

THIS  TYPE 


.549 

027 

1600 

EU341RE 

027 

0.606 

027 

UG/L 


NOT      DISPLAYED    34 


NOT      DISPLAYED    182 


SAHPLE=B  FORM  •;6A 


OBS   FORM    SEQUENCE  NUMBER 
SUFFIX  OR  COMPOUND  NAME 


1     AA    TRICHLOROETHANE 


ERROR   ERROR 
NUMBER 


960     MSO  «PD 


CURRENT  VALUE   CORRECT  VALUE   SECONDARY   COMPARISON 
IDENTIFIER   BETUEEN 
FORMS 
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ISSC  stein  Rokkan  Prize 


Announcing  the  ISSC  STEIN  ROKKAN  PRIZE  in  Comparative  Research 


The  International  Social  Science  Council,  in  conjunction  with  the  Conjunto  Universitario 
Candido  Mendes  (Rio  de  Janeiro)  announces  that  the  fifth  STEIN  ROKKAN  PRIZE  will  be 
awarded  in  December  1990. 

The  prize  is  intended  to  reward  a  very  substantial  and  original  contribution  in  compara- 
tive social  science  research  by  a  scholar  under  forty  years  of  age  on  31st  December 
1990  It  can  be  either  an  unpublished  manuscript  of  book  length  or  a  printed  book  or 
collected  works  published  after  December  1987. 


Four  copies  of  manuscripts  typed  double  space  or  of  printed  works  should  be  delivered  to 
the  International  Social  Science  Council  before  15  March  1990.  together  with  a  formal 
letter  of  application  with  evidence  of  the  candidate's  age  attached.  Work  submitted  wUl 
be  evaluated  by  the  International  Social  Science  Council  with  the  assistance  of  appropriate 
referee  or  referees. 


The  AWARD  will  be  made  at  the  ISSC  General  Assembly  meeting  in  December  1990.   Its 
decision  is  final  and  not  subject  to  appeal  or  revision. 

The  Prise  is  US  dollars  2,000.   It  may  be  divided  between  two  or  more  applicants,  should  it 
be  found  difficult  to  adjudicate  between  equally  valuable  works  submitted. 


For  further  enquiries,  please  write  to: 

The  Secretary  General 
International  Social  Science  Council 
UNESCO-  1,  rueMiollis 
75015  PARIS,  France 


International  Social  Science  Council/Conseil  International  Des  Sciences  Sociale 


Winter    l<, 


54  - 


lassist    quarterly 


PRELIMINARY       ANNOUNCEMENT 


The  Energy  Division  of  the  Oak  Ridge  National  Laboratory  and  the  United  States  Census 
Bureau  have  pledged  some  support  for  a  small  conference  or  workshop  on  ADVANCED 
COMPUTING  FOR  THE  SOCIAL  SCIENCES.  Thus  the  following  applications  seem  most 
suitable: 

Economics        Plaiming  Sociology  Geography  and  urban  studies 

Transportation  studies  Policy  analysis       Government 


Topic  areas  include,  but  are  not  limited  to: 

Supercomputing  Parallel  processing 

Satellite  tracking  &  imagery  Expert  systems 

Natural  language  processing  Databases  and  information  retrieval 

Computer  networks  Advanced  microcomputer  applications 

The  conference  is  tentatively  scheduled  for  Williamsburg,  Va.  in  mid  to  late  1989.    No  fee 
schedule  has  been  proposed;  however,  any  conference  fees  will  be  held  to  a  reasonable  level 
to  permit  the  broadest  possible  participation. 

Contributed  papers  will  be  accepted  at  a  later  date  after  submission  details  have  been 
finalized.   If  you  have  an  idea  for  additional  topic  areas,  wish  to  receive  additional  confer- 
ence information  as  it  becomes  available  or  are  interested  in  submitting  a  paper,  please 
contact 

Lloyd  F.  Arrowood 

Oak  Ridge  National  Laboratory 

P.  O.  Box  2008 

Oak  Ridge.  Tennessee  37831-6207 

(615)-574-8700 

LFA@ORNLSTC  .BITNET 

or 

LFA@STC10.CTD.ORNL.GOV 
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LH 


University  of 
Houston 


Center  for  Public  Policy 
Houston.  Texas  77204-5341 
713,749.7141 


College  of  Socl.ql  Sciences 


CALL    LOR    PAIM'R    PROPOSALS 


ELECTING  THE  SENATE 


Hosted  by  the  Departments  of  Political  Science  at  the  University  of  Houston  and  Rice 
University,  electing  the  senate,  a  conference,  will  be  held  December  1  &  2,  1989,  in 
Houston  under  the  auspices  of  the  Center  for  Public  Policy  and  the  Rice  Institute  for 
Policy  Analysis.  The  conference  will  bring  together  scholars  with  interests  in 
congressional  elections  to  discuss  and  analyze  the  1988  National  Elections  Study  Survey 
and  to  identify  changes  for  the  proposed  1990  and  1992  waves  of  the  survey. 

Proposed  panel  topics: 

o  The  role  of  issues  in  Senate  campaigns 

o  Incumbency  and  Senate  elections 

<>  Small-siate'/large-state  differences 

..  HoLisc-Senate  elections  comparisons 

o  Campaign  financing 

o  Six-year  election  cycle 


Those  interested  in  participating  in  the  conference  are  invited  to  submit  paper  proposals 
or  suggest  panel  topics.  Funding  is  available  to  cover  participant's  expenses. 


Proposals  should  be  submitted  no  later  than  June  1  and  should  be  sent  to: 


Professor  Bruce  Oppenheimer 
Department  of  Political  Science 
University  of  Houston 
Houston,  Texas  77204-3474 


Professor  John  Alford 
Department  of  Political  Science 
Rice  University 
P.  O.  Box  1572 
Houston,  Texas  77251. 


A  joint  project  of 
TJie  University  of  Houston  Center  for  Public  Policy  and  Tlie  Rice  Institute  for  Policy  Analysis 
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lASSIST 


Membership 
form 


The  International  Association  for 
Social  Science  Information  Services 
and  Technology  (lASSIST)  is  an 
international  association  of 
individuals  who  are  engaged  in  the 
acquistion,  processing,  maintenance, 
and  distribution  of  machine  readable 
text  and  /or  numeric  social  science 
data.  The  membership  includes 
information  system  specialists,  data 
base  librarians  or  administrators, 
archivists,  researchers,  programmers, 
and  managers.  Their  range  of  interests 
encompases  hard  copy  as  well  as 
machine  readable  data. 

Paid-up  members  enjoy  voting  rights 
and  receive  the  lASSIST 
QUARTERLY.  They  also  benefit 


from  reduced  fees  for  attendance 
at  regional  and  international 
conferences  sponsored  by 
lASSIST 

Membership  fees  are: 

Regular  Membership.  S20.00  per 

calendar  year. 

Student  Membership:  $10.00  per 

calendar  year. 

Institutional  subcriptions  to  the 
quarterly  are  available,  but  do  not 
confer  voting  rights  or  other 
membership  benefits. 

Institutional  Subcription:  $35.00 
per  calendar  year  (includes  one 
volume  of  the  Quarterly) 


r, 


I  would  like  to  become  a  member 
of  lASSIST.  Please  see  my  choice 
below: 

$20  Regular  Membership 
$10  Student  Membership 
$.^."1  Institutional 
Membership 
My  primary  Interests  are: 

.Archive  Services/Admini- 
stration 

Data  Processing/Data 
Management 
Research  Applications 
Other  (specify)   


n 


Please  make  checks 
payable  to  lASSIST  and 
Mail  to  : 

Ms  Jackie  McGee 
Treasurer,  lASSIST 
%  Rand  Corporation 
1700  Main  Street 
Santa  Monica 


Name /phone 


Institutional  Affiliation 
Mailing  Address 


City 


[Country /zip/postal  code 
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iMibrary&informMon:Scmc&" 


•  CURRENT  RESEARCH  is  an  international  quarterly  journal 
offering  a  unique  current  awareness  service  on  research 
and  development  work  in  library  and  information  science, 
archives,  documentation  and  the  information  aspects  of 
other  fields 

•  The  journal  provides  information  about  a  wide  range  of 
projects,  from  expert  systems  to  local  user  surveys.  FLA 
and  doctoral  theses,  post-doctoral  and  research-staff  work 
are  included 

•  Each  entry  provides  a  complete  overview  of  the  project, 
the  personnel  involved,  duration,  funding,  references,  a 
brief  description  and  a  contact  name.  Full  name  and 
subject  indexes  are  included 

•  Other  features  include  a  list  of  student  theses  and 
dissertations  and  a  list  of  funding  bodies.  Each  quarter,  an 
area  of  research  is  highlighted  in  a  short  article 


CURRENT  RESEARCH  is  available  on  magnetic  tape,  as  well 
as  hard  copy,  and  can  be  searched  online  on  File  61  (SF  =  CR) 
of  DIALOG 


Subscription:  UK  £86.00 

Overseas  (excluding  N.  America)  £103.00 

N.America  US$195.00 

Write  for  a  free  specimen  copy  to 

Sales  Department 

Library  Association  Publishing 

7  Ridgmount  Street 
London  WC1E  7AE 
Tel:  01  636  7543x360 


Library  &  Information 
Science  Abstracts 

#  international  scope  and  unrivalled  coverage 

LISA  provides  English-language  abstracts  of  material  in  over 
thirty  languages.  Its  serial  coverage  is  unrivalled;  550  titles 
from  60  countries  are  regularly  included  and  new  titles  are 
frequently  added 

#  rapidly      expanding      service     which      keeps      pace     with 
developments 

LISA  is  now  available  monthly  to  provide  a  faster-breaking 
service  which  keeps  the  user  informed  of  the  rapid  changes  in 
this  field 


#    extensive  range  of  non-serial  works 

including      British      Library     Research 
Department        reports,        conference 
monographs 


and      Development 
proceedings        and 


•  wide  subject  span 

from    special    collections    and    union    catalogues    to    word 
processing  and  videotex,  publishing  and  reprography 

•  full  name  and  subject  indexes  provided  in  each  issue 

abstracts  are  chain-indexed  to  facilitate  highly  specific  subject 
searches 

•  available  in  magnetic  tape,  conventional  hard-copy  format, 
online  (Dialog  file  61)  and  now  on  CD-ROM 

Twelve  monthly  issues  and  annual  index 


Subscription:  UK  £157.00 

Overseas  (excluding  N.  America)  £188.00 

N.  America  US$357.00 

Write  for  a  free  specimen  copy  to 

Sales  Department 

Library  Association  Publishing 

7  Ridgmount  Street 
London  WC1E  7AE 
Tel:  01  636  7543x360 
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ANNOUCEMENT 

The  following  is  an  excerpt  from  an  announcement  appearing  in  the  May  1989  issue  of  the 
ANTHROPOLOGY  NEWSLETTER; 

The  Bureau  of  the  Census  invites  proposals  from  ethnographers  of  American  minority 
communities  to  participate  through  joint  statistical  agreements  in  research  evaluating  the 
behaviorial  causes  of  census  undercounts.   Parallel  field  research  at  30  difTerent  sites 
nationwide  is  planned  for  the  decennial  census.   Proposals  should  be  received  by  no  later 
than  Tuesday,  August  1.  1989. 

The  research  of  participating  social  scientists  will  be  to  conduct  an 

independent  observation-based  enumeration  of  a  small  area  that  will  be  matched  to  census 

results  and  to  explain  coverage  problems  in  terms  of  local  sociocultural  dynamics. 

The  Joint  statistical  agreement  (JSA)  is  the  Census'  vehicle  for  funding 

participation  in  this  project...  [it|  is  a  cost  sharing  arrangement  between  the  Bureau  of  the 

Census  and  a  nonprofit  organization. 

Participating  principal  investigators  will  conduct  the  alternative  enumeration  and  coverage 
evaluation  personally.   Preferably,  applicants  should  hold  a  PhD  in  anthropology,  sociology, 
human  geography  or  a  related  discipline.  Beyond  academic  credentials,  criteria  for  selec- 
tion favor  researchers  who  have  sustained  a  relationship  with  the  community  proposed  for 
study. 

Proposals  should  include  the  following  materials;  (1)  a  2-3  page  description  of  the  socioc- 
ultural characteristics  and  location  of  the  study  site  proposed  and  the  applicant  principal 
investigator's  and  sponsoring  organization's  prior  research  or  other  relationships  (such  as 
providing  social  services)  with  the  particular  neighborhood  proposed  as  a  site:  (2)  a  vita;  (3) 
a  letter  from  a  nonprofit  organization  agreeing  to  sponsor  the  study;  and  (4)  a  preliminary 
budget. 

Send  proposals  to; 

Center  for  Survey  Methods  Research  Bureau  of  the  Census, 

Room  433, 

Washington  Plaza  Bldg, 

Washington,  DC  20233/ 

Attn;  L  Brownrigg  &  L  Shinagawa 

TEL;  301/763-7976. 
Appropriate  proposals  that  arrive  early,  in  May-  June  1989,  will  have  an  edge. 
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